Lecture Notes in 
Computer Science 1681 



David A. Forsyth Joseph L. Mundy 
Vito di Gesu Roberto Cipolla (Eds.) 



Shape, Contour 
and Grouping 
in Computer Vision 




Lecture Notes in Computer Science 1681 

Edited by G. Goos, J. Hartmanis and J. van Leeuwen 




Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




David A. Forsyth Joseph L. Mundy 
Vito di Gesu Roberto Cipolla (Eds.) 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 



Volume Editors 
David A. Forsyth 

University of California at Berkeley, Computer Science Division 
Berkeley, CA 94720, USA 
E-mail: daf@cs.berkeley.edu 

Joseph L. Mimdy 

G.E. Corporate Research and Development 
1 Research Circle, Niskayuna, NY 12309, USA 
E-mail: mundy@crd.ge.com 

Vito di Gesii 

Palermo University, C.I.T.C. 

Palermo, Sicily, Italy 

E-mail: digesu@dipmat.math.unipa.it 

Roberto Cipolla 

University of Cambridge, Department of Engineering 
Cambridge CB2 IPZ, UK 
E-mail: cipolla@eng.cam.ac.uk 



Cataloging-in-Publication data applied for 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Shape, contour and grouping in computer vision / David A. 
Forsyth . . . (ed.). - Berlin ; Fleidelberg ; New York ; Barcelona ; Hong 
Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1999 
(Lecture notes in computer science ; Vol. 1681) 

ISBN 3-540-66722-9 

CR Subject Classification (1998): 1.4, 1.3, 1.5, 1.2.10 
ISSN 0302-9743 

ISBN 3-540-66722-9 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1 965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

© Springer- Verlag Berlin Heidelberg 1999 
Printed in Germany 

Typesetting: Camera-ready by author 

SPIN 10704339 06/3142 —5 43 2 1 0 Printed on acid-free paper 




Preface 



Computer vision has been successful in several important applications recently. 
Vision techniques can now be used to build very good models of buildings from 
pictures quickly and easily, to overlay operation planning data on a neurosur- 
geon’s view of a patient, and to recognise some of the gestures a user makes to 
a computer. Object recognition remains a very difficult problem, however. The 
key questions to understand in recognition seem to be: (1) how objects should 
be represented and (2) how to manage the line of reasoning that stretches from 
image data to object identity. 

An important part of the process of recognition - perhaps, almost all of it 

- involves assembling bits of image information into helpful groups. There is 
a wide variety of possible criteria by which these groups could be established 

- a set of edge points that has a symmetry could be one useful group; others 
might be a collection of pixels shaded in a particular way, or a set of pixels 
with coherent colour or texture. Discussing this process of grouping requires a 
detailed understanding of the relationship between what is seen in the image 
and what is actually out there in the world. 

The international workshop on shape, contour and grouping in computer 
vision collected a set of invited participants from the US and the EC to exchange 
research ideas. This volume consists of expanded versions of papers delivered at 
that workshop. The volume contains an extensive introduction consisting of three 
articles: one sketches out common cause on what is understood in recognition, 
and the other two indicate different possible agendas for future research in the 
area. The editors have encouraged authors to produce papers with a strong 
survey aspect, and this volume contains broad surveys of shape representation, 
model selection, and shading models as well as discursive papers on learning in 
character recognition and probabilistic methods in grouping. These papers are 
accompanied by more focussed research papers that give a good picture of the 
current state of the art in research on shape, contour and grouping in computer 
vision. 

It remains for the editors to thank the participants, the sponsoring institu- 
tions (which are listed separately), the management and staff of the Hotel Torre 
Artale, and Professor and Mrs. di Gesu who, with help from Roberto Cipolla, 
handled local arrangements admirably. 



August 1999 



David Forsyth 




Organization 



The international workshop on shape, contour and grouping was organised jointly 
by David Forsyth (U.C. Berkeley), Joe Mundy (GE Center for Research and De- 
velopment), Vito Di Gesu (Palermo University) and Roberto Cipolla (Cambridge 
University) . 

Sponsoring Institutions 

The National Science Foundation, 1601 Wilson Blvd, Arlington, VA22230, USA 
under grant IIS-9712426 

GE Center For Research and Development, 1 Research Circle, Niskayuna, NY 
12309 

The Centro Interdipartimentale Tecnologie della Conoscenza (C.I.T.C.), Palermo 
University, Palermo, Sicily 




Table of Contents 



I Introduction 

Introduction 3 

David Forsyth (U.C. Berkeley) and Joe Mundy (GE Corporate 
Research and Development) 

An Empirical-Statistical Agenda for Recognition 9 

David Forsyth (U.C. Berkeley) 

A Formal-Physical Agenda for Recognition 22 

Joe Mundy (GE Corporate Research and Development) 



II Shape 

Shape Models and Object Recognition 31 

Jean Ponce, Martha Cepeda, Sung-il Pae and Steve Sullivan (UIUC) 

Order Structure, Correspondence, and Shape Based Categories 58 

Stefan Carlsson (KTH - Stockholm) 

Quasi-Invariant Parameterisations and Their Applications in Computer 

Vision 72 

Jun Sato (Nagoya IT) and Roberto Cipolla (Cambridge) 



III Shading 

Representations for Recognition under Variable Illumination 95 

David J. Kriegman (UIUC), Peter N. Belhumeur, and 
Athinodoros S. Georghiades (Yale) 

Shadows, Shading, and Projective Ambiguity 132 

Peter N. Belhumeur (Yale), David J. Kriegman (UIUC), and 
Alan L. Yuille (Smith-Kettlewell) 



IV Grouping 

Grouping in the Normalized Cut Framework 155 

Jitendra Malik, Jianbo Shi, Serge Belongie and Thomas Leung 
(Berkeley) 












VIII Table of Contents 



Geometric Grouping of Repeated Elements within Images 165 

Frederik Schaffalitzky and Andrew Zisserman (Oxford) 

Gonstrained Symmetry for Change Detection 182 

R.W. Curwen and J.L. Mundy (GE Corporate Research and 
Development) 

Grouping based on Coupled Diffusion Maps 196 



Marc Proesmans (Leuven) and Luc Van Goal (Leuven and ETH) 



V Representation and Recognition 



Integrating Geometric and Photometric Information for Image Retrieval . . 217 
Cordelia Schmid (INRIA), Andrew Zisserman (Oxford), and 
Roger Mohr (INRIA) 

Towards the Integration of Geometric and Appearance-Based Object 



Recognition 234 

Joe Mundy and Tushar Saxena ( GE Corporate Research 
and Development) 

Recognizing Objects Using Color-Annotated Adjacency Graphs 246 

Peter Tu, Tushar Saxena, and Richard Hartley ( GE Corporate 
Research and Development 

A Cooperating Strategy for Objects Recognition 264 



Antonio Chella, Vito Di Gesii, Ignazio Infantino, Daniela Intravaia, 
and Cesare Valenti (Palermo) 



VI Statistics, Learning and Recognition 



Model Selection for Two View Geometry: A Review 277 

Philip H.S. Torr (Microsoft) 

Finding Objects by Grouping Primitives 302 

David Forsyth, John Haddon, and Sergey Ioffe (U.C. Berkeley) 

Invariant Object Recognition with Gradient-Based Learning 319 

Yann LeCun, Patrick Haffner, Leon Bottou, (AT&T Labs - Research), 
Yoshua Bengio (Universite de Montreal) 

Author Index 347 








Introduction 



V o syth no un y 



Computer Science Division, U.C. Berkeley, Berkeley, CA 94720, USA 
daf@cs.berkeley.edu, 
http ; //www. cs .berkeley . edu/ ~daf 



Abstract. Our understanding of object recognition can address the 
needs of only the most stylised applications. There is no prospect of 
the automated motorcars of Dickmanns et al. knowing what is in front 
of them anytime soon; searchers for pictures of the pope kissing a baby 
must search on a combination of text, guesswork and patience; current 
vision based HCI research relies on highly structured backgrounds; and 
we may safely guess that the intelligence community is unlikely to be able 
to dispense with image analysts anytime soon. This volume contains a 
series of contributions that attack important problems in recognition. 



1 What We Do Well 

n solv som p o 1 ms th w 11. no tun t ly th s p o 1 ms s m to 
onn t only th t nuously w th pot nt 1 ppl t ons o o j t o - 
n t on. Th s s us 11 u nt 1 o thms oojt ontonll mo 1 
o n t on s t xplo t on o o spon n . h Ion s to sm 11 
num o typ s hwthh t st nvulms hvous. uho 
th s m t 1 s (o shoul ) ommon us n w 11 v w only v y 

fly- 



1.1 Geometric Detail for Point-like Primitives 

o nt-1 kpmtvspoj tlkpo nts; po nts p oj t to po nts 1 n s to 1 n s 
on s to on s t . w th no mo ompl x om t h v ou th n o lu- 
s on. Th s m ns th t thou h o j ts n look nt om nt v ws 

th s h n s h hly st u tu . o po nt-1 k p m t v s o spon n s 

twnm nojt tusn s h nv ous w ys o t 1 
om t mo Is ( s mpl o th s 1 1 tu n lu s 2 3 6 7 9 10 

11 18 19 23 2 25 ; th hun sopps Inwthv ntlo- 

thms o sp t hn 1 ssu s). Typ 1 1 o thms o th s 1 ss n usu lly 

n nst n s wn om sm 11 num oojt mo Is (som t m s om 

p m t mis) nst k oun o mo t lutt n sp t th 
ts o o lus on. o 1 ms n lu th st t on to x t om t y; th 1 m- 
t num o p m t V s; th ulty o u 1 n su nt pp op t mo Is; 

th n 1 un 1 1 ty o V t on n s m nt t on; nth st t on to 

sm 11 num o mo Is. 



D.A. Forsyth et al. (Eds.): Shape, Contour ..., LNCS 1681, pp. 3-8, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




4 



David Forsyth and Joe Mundy 



1.2 Some Cases of Curved Primitives 

Th ulty w th u V su s s th t thou h th h n o outl n w th 
V wpo nt s h hly st u tu th s st u tu s n lly ompl t — so om- 
pl t th t t 1 o s s lly unm n 1 . t 1 s known 

out p t ul s s non p t ul ly p t 1 t p s nt ; 11 th s n o m t on 
t n s to su st th t given a geometric class outl n s pow ul oust nt on 
om t y n V wpo nt. Th pp to 1 ss-sp onst nts on out- 

In s o 11 s s thou h only v y w us ul s s known. Th s p tu 
s ompl t y th s n o ny k n o ov nt pp ox m t on th o m. 

know no us ul om t th o ms out th outl n s o su s th t 1- 

most” Ion to p t ul 1 ss n 11 n t ons th t su h th o ms 

ult to t. 



1.3 Template Matching 

T mpl t m t h n wo ks th w 11 on som knso ontonpolm 
( n n ont 1 s s oo x mpl ; wh 1 som t Is n wo k 

out th p o 1 m s ut w 11 un stoo 1 8 1 20 21 16 ). u t mpl t 

m t h n h V s poo ly un h n o sp t n o Hum n t on ut th s 
n solv y opt np mt mlsot mpl t s ( . . 12 13 15 ). 
Th s pp o h u s o j ts to pp on th own n om s unw 1 y 

o 11 ut w so om. o mo ompl x s s mo Is n ons st 

o ompos tso p mtvs wh h th ms Iv s om p m t m 1 s o 

t mpl t s ( . . 1 5 8 1 17 22 ). 

2 What This Volume Describes 

Th utho s o th s p s two u t nt n s o utu s h on 

o j t ont on. Th s n shv nskth nthnxt two p p s. 
n th st o syth u s th t th m n u nt ulty n u 1 n pt- 
1 o n t on syst ms s th poo m n m nt o un t nty w th n thos 

syst ms; som so to ysn oms Ion ov u . t k s th pos t on 
th t m ny ult ssu s — o x mpl wh t s n pp op t p s nt t on 

o ptul stoojtso how shoul st n t u s us to om up 

w th s n 1 o n t on st t y — ss nt lly mp 1 n st t st 1 
n n tu n th t th ommun ty shoul tt mpt n to m st n pply 
V ous st t st 1 m tho s. n th s on un y u s th t mp 1 m th- 
o s no no un m nt 1 kth ou h ut th t tt un st n n o 
pp op t phys In o m 1 mo Is — o x mpl n un st n n o th 

1 t onsh p tw n th om t y o o j ts th su p op t s n th 
m pp n — m ht. Th st o th s volum ons sts o ont ut ons 

om 1 n s hs Inwthshpsh n oup n u nt t on 

n ont on. h s t on ont ns oth s hppsnppswth 

V w n nt o u to y m t 1. 




Introduction 



5 



2.1 Shape 

o th V st m jo ty o o n t on p o 1 ms sh p s n mpo t nt u . n 
Shape models and object reeognitionj' n on n oil u s s th 
u nt st t o th t n sh p mo 11 n . Th p p monst t s v ty 
o p s nt t ons o o n t on n In n th wo k n o th o 

n Is yin . Th y s p s nt tons nt ms o p ts” n show 

on w y to t non 1 ompos t on o n o j t nto p ts om n m 

n lly th p p s uss s how th ts o v w n t on th ms Iv s 

t y m si. 

Th 1 t onsh p tw n m m su m nts n o j t sh p s ompl - 

t . Th s It onsh p h s om n to 1 s w 11 s om t sp ts s t n 
Ison shows n Order structure, eorrespondence and shape based categories. ” 
Th o st u tu o oups o po nts o s not h n t ly s th oups 
p oj t to n m ; th s t n us to son out th o j t n 

V w. 

In uvs otnsujtto som o m o oup t on o th y 

snnnm . stn mhn sm o s ount n th s oup t on s 

to p s nt u V y plot o nv nt p op t s nst n nv nt p - 

m t . Th s pp o h usu lly u s th t on 1 to m su n n on- 

V n nt num o v t v s. n Quasi-invariant parametrisations and their 

applications in computer vision” to n poll show how to us u s - 
nv ntp mtstono psntn uvs. Th s h s th v nt th t 

w V t V s n m su on s w 11 n to pt som v t on n 

th p s nt t on. 

2.2 Shading 

Hum n t on s n mpo t nt sou othv tonnojtpp n.n 
Representations for reeognition under variable illumination,” K m n 1- 
hum un ohss v 11 th illumination cone wh h 

o s th pp n o n o j t un 11 poss 1 Hum n nts. Th y show 
th t th s onv X on n us to ons s sp t su st nt 1 sh n 

V t ons wh h on oun th usu 1 st t s. Th s th o y os not t t sh - 

ows wh h It w th n Shadows, shading, and projective ambiguity,” y 

Ihum u K m n n u 11 . j ts th t v th s m sh ow p tt n n 

X v w om t lly u v 1 nt up to generalised s- 1 ambiguity, 

u th mo o j ts n oust u t up to th s m u ty om mult pi 

m s un mult pi unknown 1 ht sou s. 

2.3 Grouping 

Grouping s th p o ss o ss m 1 n m ompon nts th t pp to Ion 

to th . Th V ty o sons th t ompon nts m y Ion to th — 

n Grouping in the normalized eut framework” 1 k n oil us s 

mhn sm th t s m nts n m nto ompon nts th t s t s y lo 1 




6 



David Forsyth and Joe Mundy 



00 n ss t on. Th m h n sm s tt tv us ( s th y show) t n 

us o V ty o nt oup n us n lu n nt ns ty t xtu 

mot on n ontou . 

noth son th t m ompon nts m y Ion to th s th t th y 1 

on th s m pi n n th wo 1 . n Geometric grouping of repeated elements 

within images^ h 1 tzky n ss m n show how to t m n wh n 

p tt n ons sts o pi n 1 m nts p t o n to v ty o ul s. Th y 

t m n th s 1 m nt o th p tt n n n th n oust u t th p tt n 

us p 1 1 on ul s on th pi n 1 to p 1 1 on ul s n th m n 
w 11 n w y. 

n s t 11 1 m ppl t ons oth th pos o th oun pi n n th 1- 
t on o th m usu lly known. Th s m ns th t on n t 11 wh th 

o j ts ly n on th oun pi n pp to h v symm t y s u w n n 
un y monst t n Constrained symmetry for change detection^ Th s u 
m k s t poss 1 to oup to th nt st n m nts wh h 1 k ly 

to h V om om o x mpl hum n t ts. 

n Grouping based on coupled diffusion mapsf o sm ns n n ool 
s th us o n sot op us on p o ss s o oup n . n th s pp o h 

p X Is onn t y us on p o ss th t s mo to p v nt smooth n 
ov 1 nts. Th y show x mpl s wh th s p o ss s us os m n- 

t t on o t t n symm t s o st os op oust u t on n o mot on 

onst u t on. 

2.4 Representation and Recognition 

n Integrating geometric and photometric information for image retrieval,” 
hm ssmnn oh s lolpsnttonom snt ms o 

interest points. Th s nt st po nts n us n photom t n o m t on; 

oil t ons o nt st po nts yl p snttonthtnopo tsnomton 

out sh p n out photom t y. Th s p s nt t on n us to m t h 

u y m s nst n m t s . Th y show how th p s nt t on n 
xt n to 3 u V s us n th os ul t n pi n o th u v to o t n 
mthnposstht nmthS uvs twn sm 11 s 1 n st op s. 

un y n x n s noth pp o h o nt t n photom t- 

n om t n o m t on n Towards the integration of geometric and 
appearance-based object recognition.” Th y p opos us n t mo Is o su - 
htn ss to n x o j t nt ty n omp n n s us n th n- 
N y mo 1 o su fl t n to 1 t . Th s pp o h s xt n to 

olou m s n Recognising objects using color annotated adjacency graphs,” 
yTu xn n tly. nthspp ojts psnt y j ny 

phs o olou s wh h mth usn phmth. Th p- 

p o h us s m tho om In 1 o omput n ph omp t Its 

th t s som wh t m n s nt o th m tho o \'k et al.. 

n A cooperating strategy for object recognition,” h 11 su n nt no 

nt V n 1 nt s ompl t o n t on syst m. j ts p- 

s nt us n th nt st po nts o s t symm t y t ns o m; th 




Introduction 



7 



syst m us s oop t n nts to m t th u lly mpo t nt It onsh p 
tw n top- own n ottom-up n o m t on flow. 



2.5 Statistics, Learning, and Recognition 

tho s om st t st s n oni st t st 11 n n th o y st t n to h v 
su st nt 1 mp t on th p t o omput v s on. 1 ss 1 st t st 1 
p o 1 m th t tu ns up n ni ny nt v s on ppl t ons s model selection 

— om wh h o s V 1 mo Iswsth tstotn Th s ssu must 
It w th n o n t on wh t s o t n known s verification — th ssu s do 

the pixels in this region come from an object or the background? — n st u tu 

om mot on {what kind of camera produced this scene?) n n v ty o 
oth s o V s on. To v ws th s top n Model selection for two view 

geometry: a review.” 

o syth on n o s p o 1 st 1 o thms o o j t o - 
n t on n Finding objects by grouping primitives.” Th s 1 o thms st u - 
tu oun th us o s mpl p m t v s stly p opl n n m Is p - 

s nt s yl n s n n th n oun y oup n p o ss th t ss m 1 s 
yl n s th t to th look Ik” p son; s on ly ol s n loth oun 

us n 1 ss th t o n z s th pp n n th n th kov h n 
ont lo m tho s us to oup th m nto ss m 1 s th t look Ik u kl 
p tt ns n loth n . t s usu lly ult to know wh ttous s pmtv n 
th s so t o wo k; th p p u s th t th t st t st 1 n n tu n 

th t p m t V s oul 1 n om t . 

mo t 1 omm ttm nt to 1 n n pp s n n Object Recognition 
with Gradient- Based Learning.” y L un n ottou n n o. m s 
ohn-wttnh ts It y sun o Itstv ous s Is 
n th It outputs p ss to n u 1 n t 1 ss . Not only th 1 ss 

ut th It s th ms Iv s In om x mpl t us n p o u 
known s gradient based learning, onvolut on 1 n u In two ks th n 
to th to y 1 space displacement network thtn sunohn- 

wttnh t s.nomton out th p o 1 st st u tu o h n w tt n 

num s s n o po t us n graph transformer network. 

References 

[1] M.C. Burl, T.K. Leung, and P. Perona. Face localisation via shape statistics. In 
Int. Workshop on Automatic Face and Gesture Recognition, 1995. 

[2] O.D. Faugeras and M. Hebert. The representation, recognition, and locating of 
3-D objects. International Journal of Roboties Research, 5(3):27-52, Fall 1986. 

[3] D.A. Forsyth, J.L. Mundy, A.P. Zisserman, C. Coelho, A. Heller, and C.A. Roth- 
well. Invariant descriptors for 3d object recognition and pose. RAMI, 13(10) :971- 
991, 1991. 

[4] W.E.L. Crimson and T. Lozano-Perez. Localizing overlapping parts by searching 
the interpretation tree. IEEE Trans. Patt. Anal. Mach. Intel!, 9(4):469-482, 1987. 




David Forsyth and Joe Mundy 



[5] C-Y. Huang, O.T. Camps, and T. Kanungo. Object recognition using appearance- 
based parts and relations. In IEEE Conf. on Computer Vision and Pattern Recog- 
nition, pages 877-83, 1997. 

[6] D.P. Huttenlocher and S. Ullman. Object recognition using alignment. In Proc. 
Int. Conf. Comp. Vision, pages 102-111, London, U.K., June 1987. 

[7] D.J. Kriegman and J. Ponce. On recognizing and positioning curved 3D objects 
from image contours. IEEE Trans. Patt. Anal. Mach. Intel!, 12(12) :1127-1137, 
December 1990. 

[8] T.K. Leung, M.C. Burl, and P. Perona. Finding faces in cluttered scenes using 
random labelled graph matching. In Int. Conf. on Computer Vision, 1995. 

[9] D. Lowe. Three-dimensional object recognition from single two-dimensional im- 
ages. Artificial Intelligence, 31(3):355-395, 1987. 

[10] J.L. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT 
Press, Cambridge, Mass., 1992. 

[11] J.L. Mundy, A. Zisserman, and D. Forsyth. Applications of Invariance in Com- 
puter Vision, volume 825 of Lecture Notes in Computer Science. Springer- Verlag, 
1994. 

[12] H. Murase and S. Nayar. Visual learning and recognition of 3D objects from 
appearance. Int. J. of Comp. Vision, 14(l):5-24, 1995. 

[13] S.K. Nayar, S.A. Nene, and H. Murase. Real time 100 object recognition system. 
In Int. Conf. on Robotics and Automation, pages 2321-5, 1996. 

[14] M. Oren, C. Papageorgiou, P. Sinha, and E. Osuna. Pedestrian detection using 
wavelet templates. In IEEE Conf. on Computer Vision and Pattern Recognition, 
pages 193-9, 1997. 

[15] H. Plantinga and C. Dyer. Visibility, occlusion, and the aspect graph. Int. J. of 
Comp. Vision, 5(2): 137-160, 1990. 

[16] T. Poggio and Kah-Kay Sung. Finding human faces with a gaussian mixture 
distribution-based face model. In Asian Conf. on Computer Vision, pages 435- 
440, 1995. 

[17] A.R. Pope and D.G. Lowe. Learning object recognition models from images. In 
Int. Conf. on Computer Vision, pages 296-301, 1993. 

[18] L.G. Roberts. Machine perception of three-dimensional solids. In J.T. Tippett 
et ah, editor, Optical and Electro- Optical Information Processing, pages 159-197. 
MIT Press, Cambridge, 1965. 

[19] K. Rohr. Incremental recognition of pedestrians from image sequences. In IEEE 
Conf. on Computer Vision and Pattern Recognition, pages 9-13, 1993. 

[20] H.A. Rowley, S. Baluja, and T. Kanade. Human face detection in visual scenes. 
In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors. Advances in Neural 
Information Processing 8, pages 875-881, 1996. 

[21] H.A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. In 
IEEE Conf. on Computer Vision and Pattern Recognition, pages 203-8, 1996. 

[22] C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE 
Transactions on Pattern Analysis and Machine Intelligence, 19(5):530-534, May 
1997. 

[23] D.W. Thompson and J.L. Mundy. Three-dimensional model matching from an 
unconstrained viewpoint. In IEEE Int. Conf. on Robotics and Automation, pages 
208-220, Raleigh, NC, April 1987. 

[24] S. Ullman. High-level Vision: Object Recognition and Visual Cognition. MIT 
Press, 1996. 

[25] S. Ullman and R. Basri. Recognition by linear combination of models. IEEE 
Trans. Patt. Anal. Mach. Intel!, 13(10) :992-1006, 1991. 




An Empirical-Statistical Agenda for Recognition 



vi or yt 

Computer Science Division, U.C. Berkeley, Berkeley, CA 94720, USA 
daf@cs.berkeley.edu, 
http ; //www. cs .berkeley . edu/ ~daf 

i pi r t ri w t t igni nt w kn in urr nt un- 

r t n ing of o j t r ognition. 1 k goo m for u ing unr li 1 

inform tion lik r iom tri m ur m nt tiv ly; int gr ting pot n- 

ti lly ontr i tory u ; r vi ing ypot int pr n of n w inform tion; 

t rmining pot nti 1 r pr nt tion from t ; n uppr ing in ivi u 1 if- 
f r n to o t in tr t 1 . pro 1 m r i ult ut non r 

un ppro 1 giv n ng of mp i in our r r . 

11 t import nt pro 1 m v t ti ti 1 fl vour to t m. Mo t involv 
ng of mp i from t t il tu y of p i u to n inv tig tion 
of t niqu for turning u into int gr t r pr nt tion . n p rti ul r 11 
V t ti ti 1 fl vour n n t oug t of inf r n pro 1 m . ow n 
mpl t t ugg tt tmto of yin inf rn n u tottk 

t i ulti . 

V 1 rg ly m pp out t t g om tri 1 m t o w n . imil rly 
11 t r iom tri inform tion t t on iv ly oul u ful Ir y it. 

li V t t t n t flow ring of u ful vi ion t ori will o ur w n w n- 

g g in n ggr iv tu y of t ti ti n pro ili ti mo lling p rti ul rly 

m t o of yin inf r n . 

1 What We Do Badly 

w kn in urr nt un r t n ing of o j tr ogntion 11 pp r to om 
from our on ption of n o j t mo 1 p iv un tru tur r po itory of 
t il g om tri inform tion. 



lit r tur i ri wit in ivi u 1 u to o j t i ntity from urf olour 
to g om tri primitiv . t i unu u 1 to v t u gr on nyt ing; t 
r ulting m rr m nt i voi y not omp ring t u or y ignoring 
u t m ur olour or t tur . i i riou mi t k . t i quit 

1 r t t it i g n r lly tt r to v mor u v n if om r mor r li 1 
t n ot r . 

Our u of olour n ing inform tion i till un omfort ly w k pro - 

ly u w on t V mo 1 t t n 1 (or ignor ) t n ty p y i 1 

t of int rr fl tion or olour 1 ing. m not w r of t u of t tur 

D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 9-21, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 




10 



David Forsyth 



in r ognition pt in t mpl t m t 

In it If to ir t t mpl t m t ing 
m pi ut it n ri 

tion. ignor t i inform tion pro 
for int gr ting it wit ot r u n 
mo 1 t t op wit in ivi u 1 v ri 



ing ppro . tur o n t r lly 

not V ry tig r v ry trip in t 
n t ription ont in inform - 

ly u w on t V m ni m 
u w V n t prop rly formul t 
on ( tr tion g in!). 



rforming r ognition t prop rly tr 1 1 v 1 (w r t ” p r on” or 

i y 1 ” i ion i m for t pi ig t or m nuf tur r r known) 

i om t ing w know not ing out. know from omput r in t t 
i r r i r t prop r w y to 1 rg num r of o j t i ntly n 

t r i om vi n 20 21 t t p opl m k i r nt kin of i ion out 

0 j t i ntity ( t gory-1 v 1” n in t n -1 v 1” i rimin tion) ugg ting 

t pr n of i r r y. n turn t i ugg t t t tr tion my Ip wit 

i n y. 

11 urr nt r ognition Igorit m p rform r ognition t t Ivlofini- 

vi u 1 in t n in 11 t mo 1 ; ny t gory-1 v 1 r ognition i p rform 

1 t r. i 1 to t r t ri ti lu i rou ly in i nt r t roug mo - 

1 . mpli itizing t i r y u ing g om tri inv ri nt imply r t logg 

t 1 ; it o not olv t pro 1 m u it o not 1 . 

3 

r i u pi iou profu ion of w it o j t on 1 k kgroun in t 

urr nt r ognition lit r tur ( igni nt v ri nt in lu multi u o j t on 

gr n kgroun n 1 k o j t on w it kgroun ). i i pro u t 

of t vi w t t gm nt tion n r ognition r i tin t pro 1 m n t t 

w n t rly vi ion ommunity g t roun to olving gm nt tion v ryt ing 
will ok. vi w i p rni iou n t pr ti it n our g r u iou . 
r ognition ytmt toprt t nillvlof tr tion woul r w 
ro g n ri i tin tion tw n o j t t r t; knowl g of t kin of 

i tin tion i pr i ly w t t t rting t g of gm nt tion n . n ot r 
wor o j t r gm nt inwol utyrojt;w oul rig 

t gm nt tion pro to n t kin of vi n t t 1 to t o j t 
w r looking for. 

i vi w of gm nt tion i u u 1 in pr ti t i i w t int r t op r- 
tor lin n oni r 11 out ut ploring t full pow r of t vi w 
n t n y u w v only t v gu t not ion of w t t vi n 
oul . ol -f ion vi w i t t o j t oul mo 11 om- 

po it of primitiv t t rm p rt” i oft n u n t t n ing im g 

vi n of t pr n of t primitiv i w t gm nt tion i out 2 . 

primitiv mig t p ( n t / / on/ t . t ) ut 

t y mig t 1 o r t ri ti ing or olour v nt . tr ngt of t i 

vi w i t t on n imly ow to uil n i 1 r ognition y t m lik 




An Empirical-Statistical Agenda for Recognition 



11 



t i on nil r p t gm nt tion pro t t n v ry g n r 1 wi ly 

u primitiv rtnt n mlt primitiv into 1 rg r n 1 rg r r - 
gion of ini g vi n . gm nt tion i pr ti 1 u at each stage, 

we know what we ’re looking for. g in t i i ol n w ut i Uy nil 
n w for mpl now t fu out pp r n r ognition i 

own t t ommunity i moving v ry qui kly in t i ir tion 9 pr i ly 
u it i t only w y to 1 wit gm nt tion n ompl mo 1 

w kn it t w on t know w t t primitiv r w t t mo 1 

look lik or ow to nil t mo 1 . 



t woul goo t ingifourr ognition y t m oul r ogni m ny o j t 

n if t r ognition pro i not r quir u t nti 1 r ngin ring tim 
n w o j t w n ount r . lly mo 1 of n w o j t oul o t in 

y owing t y t m v riou in t n in v riou vi w . i i not n rily 
n rgum nt for statistieal 1 rning t ory; t prop rti r v n uilt 

into g om tri r oning y t m 22 It oug oing o g n r lly r quir om 

form of impli it t ti ti 1 r oning. 

1 tion of m ur m nt ro ly int rpr t i poorly un r too . 

ypi 1 urr nt r ognition y t m will u om t of g om tri 1 m ur - 

m nt to t rmin w t r n o j t i pr nt; of our t r will It r- 

n tiv t of m ur m nt t t oul u . i on oul w u n 
w y? i i n rrow v r ion of t pro 1 m of primitiv w i primitiv 
oul w u in r ognition n w y? ot pro 1 m pp r in tr t to 

mo 1 1 tion pro 1 m . 

u 

t t of V ri tion i i gr ; it r w oul n t o it (w i oul 

pr f rr on t groun t t w r only oing it uw vnt ml 
mu of t V il 1 vi n for our ypot i for t ting it) or w oul 

o it prop rly. urr nt pr ti of ounting g point n ro ing ng r 
imply i n t goo noug . pro 1 mi on of ypot i t ting (or of i r t 
mo 1 1 tion) to w t gr o t im g vi n in t i r gion upport 

t following o j t ypot i ? i i f irly tr ig tforw r pro 1 m wit 

r 1 tiv ly tr ig tforw r t ti ti 1 olution mo 1 1 tion g in. 

2 A Brief Sketch of Bayesian Inference 

ro ility provi m ni m for omp ring in t n wit oil tiv of 
pr viou mpl ; t i m ni m m k it po i 1 to om in vi n from 
V riou our . pro of omp ri on i t roug pro ili ti mo 1 of 




12 



David Forsyth 



t m ur m nt pro u giv n t t t of t worl . or mpl w mig t 

t k 1 rg num r of pi tur of un t n tim t 

P{ ig r lo |pi tur i un t) 

y fr qu n y tim t . i num r oul int rpr t it r t t m nt 
of p t fr qu n i or gr of li f” t t ig r lo will pp r 

in pi tur of un t. t giv t pro ility of m ur m nt giv nt t t 

of t worl ; t i t rm i oft n r f rr to t conditional or likelihood. 

m in u for pro ili ti mo 1 in r ognition i inf r n . n r lly 
w p t to V pro ili ti mo 1 of t t t of t worl for w 
m ur it. i i t prior in t mpl t pro ility t t pi tur 

t k n from our oil tion i pi tur of un t or 

P(pi tur i un t) 

y i n p ilo op y y t t 11 our knowl g of t t t of t worl 

i n p ul t in t posterior t pro ility of worl t t giv no r- 

V tion . po t rior ount for t t of o rv tion on t pro ility 

t t (in our mpl ) pi tur t k n from our oil tion i pi tur of un t. 

y y rul t po t rior i proportion 1 to t pro u t of t prior n t 

lik li 00 o w V 

P(pi tur i un t| ig r lo oc P{ ig r lo |pi tur i un t) 

xP(pi tur i un t) 

Mo t omput r vi ion r r r v n t i pr ion m ny tim wit out 
ny gr t lig t. t gr t import n i t t generative mo 1 w i giv 
t w y t t t i pro u giv n t t t of t worl n turn 

into r ognition mo 1 ju t y multiplying y t prior. On ommon o j tion 
t t prior n r itr ry i t ink mpty; 11 ut t illi t oi 

of prior r ov rw Im y t in t kin of pro 1 m w wi to olv . 

r 1 pro Imit twn to Ito omput wit t r ulting po t rior 
n t t i w r i ulti ri . 

u 

impl r 1 tion ip tw n g n r tiv mo 1 n inf r n it ttr - 
tion of yin mo 1 . f on pt t i ypot i of t yin 
p ilo op y t t t po t rior n p ul t our knowl g of t worl 
inform tion int gr tion i impl . On ju t form t po t rior orr pon ing 

to t m ur m nt v il 1 . o ition 1 ompli tion pp r u t 

g n r tiv mo 1 (or lik li oo w i ri t pro t t 1 tot 

m ur m nt ) i u u lly y to o t in. 

n t following pi o Mun ymkt tttrino non- 

i 1 tru tur of 1 tr tion n o no i for ling wit t worl 

tnylvlot rt nt tofintn . imywll oun p ilo op y 




An Empirical-Statistical Agenda for Recognition 



13 



ut it i impr ti 1 y t m r it tur . v nt g of 1 tr tion i 

i lly t t on kin of m ur m nt no multipl uty. u it i wort 

n ing t n r gion wit nrprllli utr r n wful lot 

of t ing t t look r t r lik ylin r. 1 irri pprtoor 
i nt r pr nt tion y m king r 1 tiv ly m 11 num r of i ion to i ntify 
r 1 tiv ly 1 rg num r of o j t . f t i promi n rli t fttt 
t i r r y i rti i 1 i irr 1 v nt. 

ro ili ti mo 1 implify on tru ting 1 irri u t y n 

n o pli itly t v ri tion tw n in t n of 1 . i yi 1 n imm - 

i t olution to n ol pro 1 m wit g om tri primitiv it i r to prov 
nyt ing u ful out o j t t t r not exactly in t n of t primitiv 

1 ut V ry only lig tly from in t n . olution i to uil lik li oo 
mo 1 roun t m ur m nt. u for mpl if w r looking for um n 

lim w n n t from t f t t t t outlin r not only not tr ig t 

n prill utt wyt tt y i r from ing tr ig t n p r 11 1 i 

tru tur . 

ro ili ti mo 1 m y 1 o Ip to t rmin ppropri t primitiv . o 

not gr wit Mun y rgum nt t t primitiv ompo ition i or v n 
oul noni 1. t m mor on tru tiv to vi w ompo ition into 

p rt or primitiv onv ni n for r pr nt tion 1 i n y. n t i vi w 

t i tin tiv prop rti of primitiv r t t t y o ur on m ny o j t in 
imil r form ; t t t ir pr n i u ful gui to t pr n of n o j t; 

n t t t ir pr n 1 to i tin tiv im g prop rti to gui inf r n . 

pp r to m to t ti ti 1 rit ri . 

11 1 omm nt o r om op for r tion 1 1 ory of gm nt tion if 

we can extract information from posteriors easily. 



ttr tion of t y i n vi w i t t (giv n nt g n r tiv mo 1) 

11 pro 1 m wit int gr ting inform tion i pp r. Of our t tri ky it i 
tr ting inform tion from t po t rior n t i t n to t pro 1 m 
in t p t. t i V ry y to t up pro 1 m w r t prior n t lik li- 

00 11 pp r impl n y 1 1 po t rior i v ry r to n 1 (t olour 

on t n y mpl giv n low i of t i kin ). i ion pro 1 m r too ig n 
too i or rly to u onjug t i tri ution ( n ol -f ion o g of u ing 

mo 1 w r t prior lik li oo n po t rior 11 turn out to v n y” 
form). mig t i to oo worl mo It t m imi t po t rior 

ut ow o w g 1 1 i m imum? y i n gm nt t ion u ing M rkov r n om 

1 foun r on t i point. 

urr ntly f ion 1 vi w in t t ti ti 1 ommunity y t t infor- 
m tion n tr t from t po t rior y r wing 1 rg num r of m- 
pl from t t i tri ution. u for mpl if w w nt to i w t r to 
low om t ing up or not on im g vi n w woul form t po t rior 

r w mpl from t i omput n p t utility for lowing it up n for 




14 



David Forsyth 



not lowing it up y v r ging t utiliti for t mpl n oo t 

option wit t 1 rg r p t utility^. 

r wing mpl from po t rior i not t 11 y- M rkov in Mont 

rlo m t o pp r to t n w r. typi 1 Igorit m i t M tropoli - 
ting Igorit m w i woul pro u in t i qu n of ypot 

y t king n ypot i Ti n propo ing r vi v r ion Jt . n w y- 

pot i Ti+i i it r Ti or p n ing (r n omly) on ow mu tt r t 

po t rior o i t wit Tj i . On u i nt it r tion v ompl t 11 

u qu nt Ti r mpl r wn from t po t rior; t num r of it r tion 

r quir to i v t i i oft n 11 t burn in tim . mpl m y 

or m y not orr It ; if t i orr 1 tion i low t m t o i i to mix 
w 11. t i known ow to pply t i Igorit m to in w o om in of upport 
i ompli t (for mpl t num r of ypot m y not known a 
priori) 7 . 

M tropoli - ting Igorit m oul vi w kin of oup up 

ypot iz n t t pro . propo r pr nt tion of t worl n 

pt or r j tit on t po t rior; our r pr nt tion oft posteriori n 

on i t of 1 rg t of pt propo 1 . i vi w ju ti u ing urr nt 
vi ion Igorit m our of propo 1 . ru i 1 improv m nt i t t 

w n u i r nt in omp ti 1 Igorit m i tin t our of propo 1 

n t mpl w o t in r pr nt t po t rior in orpor ting 11 v il 1 

m ur m nt . mpl in tion 3 illu tr t t i ppro in gr t r 

t il. 

r r om riou Igorit mi pro 1 m it i not po i 1 tot Hr li ly 

w t r in urnt in y looking 1 1 mpl t in pro u ; in 
n mi tr m ly lowly n oft n o if not v ry r fully ign 10 ; n 

t i r n tw n u ful Igorit m n t trop i f ilur r t 

on t propo 1 pro . t mpling or oupling from t p t i (not 

t rri ly pr ti 1) m t o for ling wit t r t pro 1 m 19 17 ; t ot r 

two r not going w y nytim oon. v nt g of r pr nting m iguity 

n rror pli itly pp r to outw ig t i ulti . 

3 An Example: Colour Constancy by Sampling 

olour on t n y i goo impl mpl t t om of t fl vour of 

r ognition int n ttw ru ing mo 1 to m k inf r n from 

im g o rv tion . n t impl t v r ion w r in worl of fl t front 1 

urf wo i u r fl t n long to low im n ion 1 lin r f mily 

illumin t y olour our . o not know t olour of t our n 

wi to t rmin urf olour w t v r t our olour. pro 1 m n 

1 om if V r olv tly. 

r r m ny i r nt m ni m of w i giv i r nt ti- 

m t . t i u u 1 to um t t t illumin nt ng lowly ov r p ; u 



^ Right now, it would have to be a fairly slow target; but computers get faster. 




An Empirical-Statistical Agenda for Recognition 



15 



in lu n umption of on t nt v r g urf olour 3 lit up 

of r ptor p to w i t urf m p 1 t pr n of rp ng 

in im g rig tn 13 t f t t t p ul riti typi lly t k t our 
olour 12 14 n p y i 1 on tr int on r 11 t n n /or illumin nt 4 . 
11 of t u r i lly V li n 11 oul u .Ltugo kto 
t of olour on t n y. woul lik to u t following on tr int 

llumin nt v ry only lowly ov r p 
p ul riti yi 1 u to urf olour. 

llumin nt nrfltn r r wn from nit im n ion 1 lin r f mi- 
li . 

fl t n r V ryw r ov 0.012 n low 0.96 in v lu (t i giv 

n 0 1 yn mi r ng t w v 1 ngt on i t nt wit ot r m t o 

n p y i 1 vi n 
llumin nt r V ryw r po itiv . 

ignor vrgrfltn ty will ov r y t prior. 

3 

mo 1 urf r fl t n um of i fun tion 4>j{X) n um 

t t r fl t n r pi wi on t nt 

Us 

s{x,y,X) Y. 

3=0 

r aj{x,y) r t of o i nt t t v ry ov r p 

mo 1; in t mpl w ri t y r on t nt in 

t gri g r not known in v n . 

imil rly w mo 1 illumin nt um of (po i ly i 

ipi n um t 1 1 p ti 1 V ri tion i giv n y t pr 
our po ition t . i u ompon nt u to t 

Tie 

ed{x,y,X, ) d{x,y, 

i=0 

w r r t o i nt of i fun tion n d{x, y, ) i g in t rm 

t t r pr nt t ng in rig tn of t our ov r t r vi w . 
p ul r ompon nt u to t our i 

Tie 

y ■) ) TYi{x^y. ) ^ ^ ('^) 

i=0 

w r m{x, y, ) i g in t rm t t r pr nt t ng in p ul r ompon nt 
ov r t r vi w . 



or ing to om 
gri of o w r 

r nt) i fun tion 
n of ingl point 
our i 




16 



David Forsyth 



t n r on i r tion yi 1 mo 1 of t k t r ptor r pon 

Pk{x,y) I s{x,y,X){ed{x,y,\, ) + e^(x,y,A, ))pk{X)dX 
d{x, y, ) E {x,y) + m{x,y, ) hjkej 

W r gijk J pk{X)'ipi{X)(f)j{X)dX n hik J pk{X)'ipi{X)dX. n t i 
aj{x,y) i pi wi on t nt wit om p ti 1 mo 1 in w t follow w 

um t t it i pi wi on t nt on gri but we do not know what the grid 

edges are. p ti 1 mo 1 for t illumin nt follow from t point our 

mo 1 w r it po ition of t our n m{x, y, ) i o t in u ing 

ong mo 1 of p ul riti . 

00 uniform prior for rfltn o int. pt illumin nt 

to V no rom ti i n o u u i n prior wo m n i w it ; w 

How f irly u t nti 1 t n r vi tion to How for illumin nt t t r 
olour . 

u t g n r tiv mo 1 i 

mpl t num r of r fl t n t p in x n iny {kx n ky r p tiv ly); 
now mpl t po ition of t t p ( a: n r p tiv ly); 
for til mpl t r fl t n for t t int rv 1 from t prior (ct™ for 
t m t til ; 



mpl t 


illumin 


nt 0 i nt 


from t prior: 


mpl t 


illumin 


nt po ition 


from t prior; 


n r n 


r t im 


g ing 


u i n noi . 


0 W V 


lik li 00 







n 



F(im g \kx,ky, x, ) 

po t rior i proportion 1 to 

P(im g \kx,ky, X, y.(jJ-,ei, ) x Prior {kvx) Prior {kvy) 

X Prior { x)Priar{ y) 

X Prior{a^) 

m til 

X Prior (ci) Prior { ) 

11 w V to o i r w mpl from t i . 



3 

mpling pro i tr ig tforw r M M . ropo 1 mov r of four 

typ 




An Empirical-Statistical Agenda for Recognition 



17 



t t il r in mpl 1 of 7 wit t ption 

t t w riv propo 1 i tri ution for t po ition of t t p from t 

im g gr i nt o t t w r mor lik ly to propo t p w r t 
gr i nt i ig . 

in mpl 1 of 7 . 

in mpl 1 of 7 . 

fl u t i om u tl pitf 11 . t i 

t mpting to illumin tion n ngrfltn tn rfltn n 
ng illumin tion u ot t p will involv mpling u i n 
w i i y. i it turn out i i u t in mov 

tr m ly lowly if w o t i . pi n tion i quit impl ; giv n r - 

fi t n /illumin tion p iring t t i quit goo m 11 ng 1 to ug 

in r in t rror. u t pro will 1 to r fl t n t t work 
w 11 wit t urr nt illumin nt or n illumin nt t t work w 11 wit t 

urr nt r fl t n n will mov tr m ly lowly, n t w u m t o 

u to 1 1 t t uppr r n om w Ik y joining mom ntum v ri- 
1 n t n mo lling t t t t po ition of p rti 1 in n n rgy 

1 ; typi lly t t t mov to po ition of 1 rg po t rior v lu qui kly 

t w i point w t row w y t mom ntum v ri 1 . 



3 3 X X 



ow om r 


ult 0 t in u 


ing 


r 


1 t 


t from . i t w 


p 0- 


togr p wit 


m r 


t n 


i 


pi y 


on 


p otogr p 


from 


t t r n wit 


Im m r 


u j 


t 


to unknown printing pro 


n 


t n nn from t pu li 


P P 


r; 


u 


u 


tt r t li 


r t ly 


to illu tr t t 


pot nti 1 pow 


r of t 


pro 


I 


r no p ul r 


ompo- 


n nt in t i 


t t m king t 


P 


ul 


r r 


oning 


impl . 0 t in 


i 


u ing t m 


ni m of M rimont n 




n 


11 16 . 


on tr int on o 


i nt 


w r tim t 


u ing gr p i 


1 m t 


0 










n 


t p ti 1 mo 


Ipl 




g 


1 1 


rig t point ; t i i 


r ly 


urpri ing 


olumn n row 


g 


om ig 


ontr t point . 


igur 1 


ow tt 1 


• plot of r fl t 


n 


mpl 


tim t 


for V riou orr 


pon - 


ing til for im 


g 0 t in un r 


i 


r nt 


olour 


illumin nt un 


r t 



umption t t im g w ompl t ly in p n nt. in t mpl li 

in r on ly lo group omp r wit t r ptor r pon group t 1- 

gorit mi i pi ying on t n y. mpl ont in not only tim t of 

r fl t n ut 1 o information about a reasonable range of solutions. i i 
ru i 1 n pow rful. 

urr nt olour on t n y Igorit m nnot 1 wit prior knowl g out 

t worl . i Igorit m n. or mpl on i r t t of knowing t t 

til i in n im g un r w it lig t i t m ” til j in n im g o t in 
un r purpl lig t. t i quit pi u i 1 to w nt to know t i knowing t t 
n o j t i u n n oul t our r port on it olour. n our r pr - 

nt tion w n p rform t i 1 ul tion y resampling t t of mpl . 
mpl p ir of r pr nt tion on o t in un r t w it lig t t ot r 




18 



David Forsyth 



2.5 




un r t purpl lig t o t t p ir w r t two r fl t n r imil r 
r r pr nt mor oft n (t i n 11 m form 1). igur 2 ow t 
t on r ult ; t i inform tion r u un rt inty. urt r t il 

pp r in 6 . 

4 How Inference Methods Can Help Address Onr 
Problems 

ro ill ti mo 1 r i ult to u n to t up n t r r no r li 1 
Igorit m for n ling pro ili ti mo 1 on t It t vi ion will r quir . 

p r igm i t rig t on ; w 1 rn to u g om tri mo 1 w 11 n w 
oul now ir ting our ort tow r u ing pro ili ti mo 1 w 11. 



il mpling Igorit m r urr ntly low n i ult to uil w 11 t 
V nt g t t omp n t for t i r 

g t goo r pr nt tion of 11 t on lu ion t t nr on ly 

r wn from t t (i. . t po t rior). 



An Empirical-Statistical Agenda for Recognition 



19 




T w 

u u 

u u 



T 

u u 
u 



fl 



u resampled 



t i y to ow to in lu ot r form of inform tion in t r oning 
pro ; r writ t lik li oo mor propo 1 m ni m n pro 

f form of on ition 1 in p n n ppli (w i it o in u ful ) 
w nr mpl n i ting t of mpl . 

o not n to n on urr nt Igorit m to i v t i ; in t w 
n lo k t m in pro ility n u t m propo 1 pro ( wit 

t u oft gr i nt to propo t p po ition ). 



olour on t n y Igorit m ri ov gm nt r uilt into it 

wit V ry littl fu t i i t p ti 1 mo 1 of r fl t n ing pi wi 

on t nt in til . i illu tr t tr ngt n w kn of mpling m t o . 

tr ngt it t t r 1 tion ip tw n gm nt tion n r ognition 
i uilt in; t im g i gm nt into r gion t t r giv n y t mo 1 n 

w r not r quir to p rform g n ri gm nt tion. w kn i t t t 

mo 1 i ” gm nt tion in t propo 1 pro ; if w v goo propo 1 
pro t pro 1 m of gm nt tion 1 rg ly i pp r nt if t y r 

w m y n V r n i 1 int rpr t tion. goo n w i t 1 1 r i om 

g n r 1 inform tion out w t m k goo propo 1 pro ( .g. 1 ) n 

t t it i y to o t in propo 1 pro from urr nt knowl g of vi ion. 



20 



David Forsyth 



3 



woul 
i r r 
r. or 



m ni m for 1 rning lik li oo 
it. p t t t mo 1 

00 fun tion t t woul v 
nt tion of on form or not 

p opl i lik ly to m nipul t 
t g n w t rm look lik 
ur t mo 1 utom ti 
itiv to u . om tri r oning 

1 ulty om from o viou our 
r t ri ti pp r n in n im g 

n mu t 1 o u ful for m ny o j 
rit ri r t ti ti 1 r t r 



{P{measurement\model)) from t 
r pr nt y ompli t lik li- 

i 1 tru tur n int rm it r pr - 
mpl mo 1 for r ogni ing lot 
r pr nt tion of w t lot ing look lik t on 
t not r. i 1 i to 1 y out i nt n 
lly from pi tur . n to t rmin w t prim- 
n V ry littl Ip r in t p t . 
u ful g om tri primitiv will v 
t i inform tiv out it 3 tru tur 
n unlik ly to ri from non-o j t . 
n g om tri in n tur . mit n 



t 



t 



t 



m n V on tru t om promi ing Igorit m for utom ti lly t rmining 
impl im g primitiv 1 ut t r i long w y to go. 



u 



tur 1 tion r iv m r ingly littl tt ntion in t r ognition 
ommunity. qu tion i impl ; w i m ur m nt oul u tor og- 
ni n o j t? t pp r in 11 urr nt Igorit m in ot t in ing n 

t V ri tion p . n w r n only t ti ti 1. 

m in i ulty wit t ti ti 1 work on f tur 1 tion it n of 

notion of omput tion 1 or m ur m nt o t. y i n n w r to f tur 

1 tion i to u 11 m ur m nt ppropri t ly w ig t to t k ount of 
t ir r li ility. i i 11 v ry w 11 ut it mi t f t t t m ur m nt 
m y i ult or pnivtomk ntttri limit mount of 
omput tion v il 1 . ti quit urpri ingt 1 1 i qu tion w i i o viou ly 
ntr 1 in o j t r ognition r iv o littl tt ntion in t lit r tur . 

5 Summary 

ro ili ti mo 1 n u to r t gr t t w kn in our un- 
r t n ing of o j t r ognition int gr tion tr tion gm nt tion n 
f tur 1 tion. gr t pow r of pro ili ti mo 1 i t t t y n n- 
p ul t our of V ri tion w o origin r too ompl to r inv tig tion 

for mpl t V ri tion in t outlin of p r on lim u to i r nt 

mu 1 ulk. mpling m t o r goo noug to pro u ttr tiv olution 
to impl vi ion pro 1 m . r rut nti 1 11 ng in uil ing i nt 

omput tion 1 impl m nt tion ut t i ulti r wort r ing. 

om try n r iom try r imply not our mo t import nt r of ignor n . 

n f t w n urr ntly pr i 1 1 pp r n of o j t r t r w 11 i. . w 

n uil quit goo lik li oo mo 1 . oul now tu ying mo lling 

n inf rnint ut rt torionwi ytmtiviw 

of t ov r 11 pro of r ognition will 




An Empirical-Statistical Agenda for Recognition 



21 



References 

[1] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree 

classifiers. T P , 19(11) :1300-1305, 

1997. 

[2] T.O. Binford. Visual perception by computer. In P 

, 1971. 

[3l G. Buchsbaum. A spatial processor model for object colour perception. 

, 310:1-26, 1980. 

[4] G. Finlayson. Colour in perspective. T P 

, 18:1034-1038, 1996. 

[5] D.A. Forsyth. A novel algorithm for colour constancy. u , 

5:5-36, 1990. 

[6] D.A. Forsyth Sampling, Resampling and Colour Constancy. P 

u PR, 1991. 

[7] P. J. Green. Reversible jump markov chain monte carlo computation and bayesian 

model determination. , 82(4):711-732, 1995. 

[8] H. Helson. Some factors and implications of colour constancy. 

, 48:555-567, 1934. 

[9] C-Y. Huang, O.T. Camps, and T. Kanungo. Object recognition using appearance- 

based parts and relations. In u P R 

, pages 877-83, 1997. 

[10] Mark Jerrum and Alistair Sinclair. The markov chain monte carlo method: an 
approach to approximate counting and integration. In D.S.Hochbaum, editor, 

P P . PWS Publishing, Boston, 1996. 

[11] D.B. Judd. Hue, saturation and lightness of surface colors with chromatic illumi- 
nation. , 30:2-32, 1940. 

[12] G.J. Klinker, S.A. Shafer, and T. Kanade. A physical approach to color image 

understanding. u , 4(l):7-38, 199. 

[13] E.H. Land and J.J. McCann. Lightness and retinex theory. , 

61(1):1-11, 1971. 

[14] H.C. Lee. Method for computing the scene-illuminant chromaticity from specular 

highlights. , 3:1694-1699, 1986. 

[15] L.T. Maloney and B.A. Wandell. A computational model of color constancy. 

, 1:29-33, 1986. 

[16] D.H. Marimont and B.A. Wandell. Linear models of surface and illuminant spec- 
tra. , 9:1905-1913, 1992. 

[17] D. J. Murdoch and P. J. Green. Exact sampling from a continuous state space. 

T V u , 1998. 

[18] R.M. Neal. Probabilistic inference using markov chain monte carlo methods. 
Computer science tech report crg-tr-93-1. University of Toronto, 1993. 

[19] James G. Propp and David B. Wilson. Exact sampling with coupled markov chains 

and applications to statistical mechanics. R u u , 

9:223-252, 1996. 

[20] E. Rosch and C.B. Mervis. Family resemblances: Studies in the internal structure 

of categories. v P , 7:573-605, 1975. 

[21] E. Rosch, C.B. Mervis, W.E. Gray, D.M. Johnson, and P. Boyes-Braim. Basic 

objects in natural categories. v P , 8:382-439, 1976. 

[22] A. Zisserman, D.A. Forsyth, J.L. Mundy, C.A. Rothwell, and J.S. Liu. 3d object 

recognition using invariance. , 78:239-288, 1995. 




A Formal-Physical Agenda for Recognition 



o M n y 

orpor s r h n v lop n 

Ivro hn y 12 

mundyScrd . ge . com 



1 Overview 

1.1 The Recognition Task 



nr sk or o p r vision is o x r s rip ion o wor s 

on i s. n i por n no s rip ion is ss r ion sp i 



in 


ivi 




0 j s n pr vio 


s y 0 s rv or 


n 0 J 


is 


SI 1 


r 0 




3 0 0 


j s s n in p s . 


is pro ss 0 


recognition i 


r 


y 0 




0 


niz 


P 


r i s n r ion o 


xp ri n n 


VO ion 0 


r 


ions 


ips 




w 


n 0 


j s s on s ri s 


0 0 s rv ions. 
















i i y or o niz o j 


s in r 


s n wi 0 


P 


X i 


i 


n 


ion 


n 


s ows s prov n 


0 on 0 


os i 




n s 


. or 


0 


P 


r 


vision. is ss r ion is 


in 0 p r 


n wi 




in 


r s 


0 




0 


p nion r i y v 


orsy . r w 


i r is w 


0 


P 


siz 


in 


or 


r 0 k pro r ss. 













irs i is ss n i or pro o r o ni ion in i or i 



so si ni n iss s 


n 




ry 


1 n 1 




n 


so 0 1 


1 s 


0 r 0 ni ion ris ro 


w 




r p i osop 


i 


pro 


s wi 


ni ion 


0 class i s 


i pi 


i 


ss 


p ion 


is 




r is 


so thing in 


wor 


w i n 


sso i 


wi 


0 


s rv 




ri 


s 


n s 


ri s 


r in so 


s ns invariant 


ro 


on 0 


s rv 


ion 


0 


n X . r is 


so 


p r ss 


p ion 




n i 


y is 


is r 


n 


n vis y s 


n 


ro 


k ro n 0 


0 


r 


ri . 










ss 


p ion 0 


ss 


is 




n 


r 0 


in ivi 0 s rv 


in s 


r in so 


w y sso i 




n s 


0 


r 




s ins n s 0 or 


n r 


on p . 


is r ss 


p ion is pri 


riy 




n i 


posi ion on r i 


yw i 


provi s 


ns or on 


ro 


in 


0 p 


xi y 


n 


or r 


sonin o 


wor 


ro n s. 




















n ri 


s 0 p i osop 


i 






V 




in 0 


q s ion v n 


s p 



prnyn n ss p ions. wi no k sp r o xp or s 
onsi r ions ow v r pro srsvrn isks o sisss 



V o onsi r ws rsr or.orx p iws pop r in 
19 O’s o ry o i p n n ri r o ni ion ss s s s house or chair 
wi o r izin r w s no op r ion ss ni ion possi 

p i osop is o irs o y is s riz y o owin wo 
prin ip s 

D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 22-27, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 




or 1 hys 1 g n or 



ogn on 



2 



1. on y in ivi o j s xis ; 

2. ss is n y i s in ivi rs w i r s or. 

or o r p rpos s i wi ss wo o j s r s is in 

n r o significant vis ri s r similar. r r ni ion o 

o j ss s is s on s vis si i ri y n in p rpos o 

ss s is o n effective r o ni ion. 





ni ion 


0 w 


IS 


SI ni 


n 


IS n 


r 


on 


pro 


ss s 


0 1 




n 


ion n 


sp i 


or 


niz 


ion. 


si ni 




n ri 




n 


r i 


y 


inv 


ri n y X 


r 


ro 


i 


S 0 


n n i 


y- 


si ni 


n 


ri 


is 


so 



salient in i s v ri ion is pri ri y o i r n s in o j ss n no 
o o s rv ion n i r n s w n in ivi so ss. ni ion 

0 w is si i r r q ir s so q n i iv s r on o j ri s w r 

s r is so ppin ro ri s o s o or r v s. 

1.2 The Current State 

n r y s ro is s o ni ions w y r o ni ion is in rinsi y 
r . i ns r q i xp r r o ni ion w v i or nr 

s n in o ri s w i i s is y sir prop r i s j s s 

nv s i ions o p ysio o i n vior s s o n vision v 

provi i insi so i r pr s n ion o o j os 

si or r o ni ion. 

ns in ppro ov r or y so o y rs o o p r vision 

rsr s noppyor rprsn ions n i y riv 

ori s. s r pr s n ions n ori s r orrow ro iv rs s 
s spoor ry op i s o i in is i s o ry n ri n ysis 

s is i s 

s i p ysi is ip in s v prov n iv in p r i r 

pro o ins nr ssi o ni y pop ion w o 

rk on o p r vision r s r . s ons q n opr vision r s r 
is o in y s w r n w or nis or x p r p ory is 

in ro s issin ink w i wi ni y pr vio s ppro s n provi 

on so ri s n s r s r q ir or r o ni ion os. i 

s sw pin i s r n V r r iz in ro ion o n w r pr s n ion 

in ry is o n n i n so pro r ssr s s. ow v r n n 
i i s r s i wi s. 

si ion s ri in in ro ion is n r pi ion o o r 
s o piiy. V iiyoxr in nsi y is on in i i s ro 

1 ry n s o o n r ions o s ow y v ryin in nsi y r pr s n in 

proj ion osr p sinop rsp iv i ry. proj 2 

o ryo o in on ri so sp n r ypri 

orwnsyi o ori vior o n r s p ss s. 

r r w n rs 00 i on ions o r i proj iv 

o ry s o or p rsp iv i or ion s o iv onsi r 

rsrovr s ovopo rioj os. nvrin 




2 



o 



n y 



o ri s rip ions orpnrs ps v prov on iv in x 

in wi s ow r row in r q ir v ri ion n or n x s iv 

onsi r ion o known o s 10 . 

ow V r ons r ion o s inv ri n s r q ir s r n os 

op o n ry s n ions w i nno i v n r o p x 

s n on i ions. i ion y x nsion o s inv ri n o s o 

spp vnr r rnson ryn op nssos 

n ion. r r inv rinso nr sssr known on y or s p s 

wioysri i rqir ns. vno ory or s 

o is or ion ro x sp oss ssoiorvo ion or o j 
wi i r sy ry. 

s on n r o niz in nsi y o s r s onv ys i por n 

in or ion or r o ni ion o in s o i in ion ir ion 

n V ri ions insr rflnnxr o os npr i 

proj i in nsi y v s or iv n o j s r p . is no 
in nsi yv srrno iis p ysi si ion is o p x wi 

ny nknown p r rs. 

s o r o p ysi s n o ry is s r in nsi y 

si ions n i v w n or s rin o s r o on 

ro s r ri s n w n i in ion so r s r known. rr n 

0 j r o ni ion pr i s no si ni n s o is p i i y 

so vri iiyooj sr swiin ssni nor n o o 

1 in ion o s owin n r fl ions. 

wi is si ion n w w s ini i in r o n ion r 

sr o i yrson s o known s pp r n s r o 



ni ion. 


ppro 


is n 


n 


y pattern recognition{ . 


w r s 


0 r 


s is n 


n 


s 


is i r 


ions 


ip 


w 


n r s n 


ss is s 


is 


piri y. 




ppro 


0 s no 


pr 




in ro ion o 


s r r 


in r s 0 p r 


riz 


0 s 




work 


0 


s p siz 


pry 


piri 


s rin . 














wi 


r 




is r 


i n on 


piri 




0 


s w i pp r n y 


s ss 


wi no 


n r 


pro 


ss in 


pro 


0 


0 j 


r 0 ni ion. 


r n 


is or 


in 


0 


owin s 


ion. 









2 Theory vs Empiricism 

n orsy r i i is r pro i i y provi s n iv o 

o wor n is k y o pro r ss. r q i i r n p sis is pro 
pos is o r o o r or s o i v pr i or o o 

pp r n o o j ori s. 

irs w is or o ? y formal w n w v o i 

o p ion ni ion onoj sswi so pri ion o 

i pp r n o ny ins n o ss. xp o 

r q ir s sp i ri so o j ins n i s nsor n i in ion 




or 1 hys 1 g n or ogn on 2 

in or r o ons r pprn. v oswi v wpr rs 

o p r o pr i pp r n 

s s so r ni ion wo no r o s is i o s. n 

i ion r q ir n on or o is i p o s r ion. 

or X p s ppos w V n iv r o ni ion o or ory 



horse. 




k y n 


ion 0 


or 


0 is 


0 


0 xpr 


ss 




i 


r n s 


w 


n 


ors s 


n p 


n s 




0 wo 




0 riv 


n 


w r 


0 


ni ion 


0 


y knowin 


s r r 




i 


r n s n 


i r 


n s in s r 






ri 


n 


S 0 






0 sp iy 




ss 


ni ion in r 


so VO 




ry 


0 


or 


s r 


r 


s. 






















n 


ppro 


0 is r 


q 


ir 


n s 


n 


ss s rip 


ion y parts, n 


0 j 


is 


0 pos 


0 p r s 


P 


r 


ps s ri 


0 


ri y 


n 


ons 


r 


in s on 




0 


ri r 


ions ip 




w 


n p r s. 


is r 


in 


r p 


r ' 


ons r in s 


n 




riv 


piri y 


or 




iv n ss 


2 . 


ow V r 


V ry sp 


i 


ion 


0 


P 


r 0 posi ion i s 




is 


no piri 


i r 


pr s n s 


0 


i 




n 0 


or 




ni ion 


0 ss. 




















i 


is r 


iv s r 




r 


r pr s n 


ion 


ri s 


i 


r n 




w n 



or n piri o s r pr s n ion yprspr i sno n 
p r i r y s ss or r o ni ion. k y i y is r ov rin p r s 
n p r r ions ips ro i ry in o p x i s n s . is i y s 
n in rpr s i r o or ss r pr s n ion n s in r s 

p sis on s is i ss o s. 

is ss r or is is no o ins w n o 

V op ory o ss s s on ir pp r n n w is r ov r 
ro i ry n r o sion n o p x i in ion. o k y q s ion is 
ow o w iv y r in w is r ov r ? 

is propos 0 0 s on p ysi y riv o s. is r sin 

o r i i or r pr s n ion o p ysi r i y n x n in i is s 

w no. i isoi yii oj sssw nip nwn 
wos w nnrsnwyn nripi ions os s 

r s s. 

n is ppro o j s rip ions r s on p ysi o s s s 
o ry n r fl ion ory. ory o so ins is s o riv 

s rip iv inv ri n s w i riv ni ion o r ov r ri s. is 

is no o s y p r rs o s s rip ions n sp i priori, 

ns w xp or n ysis o r i nsion i y o s rip 

ion o n s o r p r rs r riz in ivi o j 

ins n s. 




V n 



o owin r 




2 



o 



n y 



i ir ion fl n is ri ion n ion( ) in ro y 

o n rink 6 provi s n n o in ion o or i n piri o 



s. 




0 n 


rink o 


n s or 




n r 


prop 


r i s 0 


s r i 


i 


n ion 




wi 


r p r rs 


n 




q ir 


ro 


piri 


0 s rv 


ions 


or s 


r 


s. 


is on p s 


n r 




0 pr 


i 


y n 


w 


r 


ov r 60 


i 


r n 


s r s w r 


s r 


n 




piri 




s ons r 




s r 


s 


s 


iv op 0 


p X 0 


j 


PP 


r n 


n 


0 


y 



ri in o j s r s wi piri y riv s. 





r 


r 


V n 




is 


work 0 0 






r n 


ri 


n 1 


W 0 


s ow 






sp 


0 possi 


i 


s or 


iv 


n 


s r 


s 


ony 


r 


r 


S 0 


r 


0 . 


or po y 




r 


is sp 


is 




onv 


X on 


n 


y 


r 


n 


0 i 


in 


ion ir 


ions 


n s r 


nor 


S. V 


n 0 


i 


i 


0 L 




r i n 


r fl 


n 


n 


ions 


work 0 


is 


yp 


is n i 


por 


n s 


p in 


V n in 


0 r 


or 


i 


n 


rs n 


in 0 


PP 


r n 


0 


s. 


ons r 


in 


si i 


r 


ori 


s s 


on 


or 


r pr 


s n iv s : 


r 


r fl 


n 


s r 


ions 


ik 




0 n 


rink 




0 


is 


y 


n 




n r iz 


ion. 








r 


V 


n 


w 




P s 


0 ir 


y in 


r 


in nsi y 


n 0 





ri in or ion in r o ni ion sys s. P r ps os x nsiv work is 

o i 9 w r o in nsi y riv iv jets r s o ons r n in 

V ri n r pr s n ion oo oj pprn. isrprsn ion is p r y 



piri r iv y i no o sion n s owin i i ions o 

or o in nsi y r pr s n ions s in L M 7 . inv ri n o o 
r in nsi y ri so onsi r y v n y in orpor in 

rspr riz yrfl n o s. yno onsi r ri prop r i s 
s w so ri prop r i s? o v ry iv ’s v on y r 

p r rs n o q ir ro r or or i in ion vi win on 

i ions. 

r n sis y 00 s s ows v n o s in ro ion o or 

oinnry r ro piri rnin . oo s s s 

r ions ip w n i in ion so r ns nsor vi wpoin wi r sp o 
oj sr so r riz sps piri y riv ro s o i 
s. s ows is r riz ion r y i prov s r o rnin 

yvnr y on in or sys i v ri ions w i r o o 

sion n s owin . s ows r i i prov n in r i i y o 

o j in n i V wi j s on i s p sin in r 

piri n or os. 

4 Summary 

is isos V ro opyinoj roni ion in y o on 

or V ri ions w nno or oos no o ir y o . so no 
s s i or workin r on n rs n in r ions ip w n p ysi 
prop r i s ss n i pp r n . 

Pro i i y is so s ri or vi n . n is r r r is no 

on r s o ss r ions y orsy . k y is in ion in sis r is 




or 1 hys 1 g n or ogn on 2 

o j ssi ion vi n k s wo or s o i r n on i ion 
pro i i y. ro o on i ion pro i i y is o on or nknown v ri 
ions n nknown r ions ips no s s s i or workin o r 

o i os. 

References 

1 .. Ih rn ..rg n. h shso gsonoj nr 

11 poss 1 1 gli ng on ons? n IEEE Conf. on Computer Vision and Pattern 

Reeognition pgs202 199. 

2 ..oos .. wrsn .. ylor. v pp r n o Is. n 

European Conferenee on Computer Vision 199 . 

n .. yr .vn nnknn ..onr nk. Ann 

X r o r 1 worl s r s. n IEEE Conf. on Computer Vision and Pattern 
Reeognition pgslll 199. 

. . n . . r . Pattern elassifieation and seene analysis. 1 y 19 

. . orsy h n ..Ik. o y pi ns. n IEEE Conf. on Computer Vision 
and Pattern Reeognition 199 . 

o n r nk . . n oorn n . vr . r on 1 r fl on s 
r on n on xpr ss nrsosr s rngos. n European 
Conferenee on Computer Vision 199 . 

r s n . y r. s 1 1 rn ng n r ogn on o o j s ro 

pp r n . Int. J. of Comp. Vision 1(1) 2 199 . 

. . pi y. Pattern recognition and neural networks. r g n v rs y 

r ss 199 . 

9 . h n . olir. o n ng gr yv 1 nv r n s w h lo 1 ons r n s 

or o j r ogn on. n Proceedings of the Conference on Computer Vision and 

Pattern Recognition, San Prancisco, California, USA n 199 . 

ss r n . . orsy h . . n y . . o liw 11 n . . . o j 

r ogn on s ng nv r n . Artificial Intelligence 2 9 2 199 . 



10 




Shape Models and Object Recognition 



Jean Ponce, Martha Cepeda, Sung-il Pae, and Steve Sullivan* 

Dept, of Computer Science and Beckman Institute 
University of Illinois, Urbana, IL 61801, USA 



1 Introduction 

This paper discusses some problems that should be addressed by future object 
recognition systems. 

In particular, there are things that we know how to do today, for example: 

1. Computing the pose of a free-form three-dimensional object from its outline 
(e.g. [106]). 

2. Identifying a polyhedral object from point and line features found in an 
image (e.g., [46, 89]). 

3. Recognizing a solid of revolution from its outline (e.g., [59]). 

4. Identifying a face with a fixed pose in a photograph (e.g., [10, 111]). 

There are, however, things that we do not know how to do today, for example: 

1. Assembling local evidence into global image descriptions (grouping) and us- 
ing those to separate objects of interest from the background (segmentation). 

2. Recognizing objects at the category level: instead of simply identifying Bar- 
ney in a photograph, recognize that he is a dinosaur. 

This is of course a bit of an exageration: there is a rich body of work on 
grouping and segmentation, ranging from classical models of these processes in 
human vision (e.g., [65, 116]) to the ever growing number of computer vision 
approaches to edge detection and linking, region merging and splitting, etc., 
(see any recent textbook, e.g., [38, 73] for surveys). Likewise, almost twenty 
years ago, ACRONYM recognized parameterized models of planes in overhead 
images of airports [17], and the recent system described in [33] can retrieve 
pictures that contain horses from a large image database. Still, segmentation 
algorithms capable of supporting reliable recognition in the presence of clutter 
are not available today, and there is no consensus as to what constitutes a good 
representation/recognition scheme for object categories. 

This paper examines some of these issues, concentrating on the role of shape 
representation in recognition. We first illustrate some of the capabilities of cur- 
rent approaches, then lament about their limitations, and finally discuss current 
work aimed at overcoming (or at least better understanding) some of these lim- 
itations. 

* This work was partially supported by the National Science Foundation under grant 
IRI-9634312 and by the Beckman Institute at the University of Illinois at Urbana- 
Champaign. M. Cepeda is now with Qualcomm, Inc. and S. Sullivan is now with 
Industrial Light and Magic. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 31—57, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




32 



Jean Ponce et al. 



2 The State of the Art and Its Limitations 

Let us start with an example drawn from our own work to illustrate the capa- 
bilities and limitations of today’s recognition technology. While it can certainly 
be argued that more powerful approaches already exist (and we will indeed dis- 
cuss alternatives in a little while), this will help us articulate some of the issues 
mentioned in the introduction. 



2.1 An Example of What Can Be Done Today 

Here we demonstrate that the pose of a free-form surface can be reliably esti- 
mated from its outline in a photograph. Two obvious challenges in this task are 
(1) constructing a model of the free-form surface that is appropriate for pose 
estimation and (2) computing the six pose parameters of the object despite the 
absence of any three-dimensional information. 

We have developed a method for constructing polynomial spline models of 
solid shapes with unknown topology from the silhouette information contained 
in a few registered photographs [106]. Our approach does not require special- 
purpose hardware. Instead, the modeled object is set in front of a calibration 
chart and photographed from various viewpoints. The pictures are registered 
using classical calibration methods [110], and the intersection of the visual cones 
associated with the photographs [9] is used to construct a G^-continuous trian- 
gular spline [22, 30, 60, 100] that captures the topology and rough shape of the 
modeled object. This approximation is then refined by deforming the spline to 
minimize the true distance to the rays bounding the visual cones [108]. Figure 
l(a)-(b) illustrates this process with an example. 

The same optimization procedure allows us to estimate the pose of a modeled 
object from its silhouette extracted from a single image. This time, the shape 
parameters are held constant, while the camera position and orientation are 
modified until the average distance between the visual rays associated with the 
image silhouette and the spline model is minimized. In fact, the residual distance 
at convergence can be used to discriminate between competing object models in 
recognition tasks. Figure l(c)-(d) shows some examples. But.. 

2.2 Is This Really Recognition? 

Of course not: we have relied on an oracle to tell us which pieces of contours 
belong to which object, since the spline representation does not provide any sup- 
port for top-down grouping (as shown by Fig. 1(d), occlusion is not the prob- 
lem). Although it is possible that some bottom- up process (e.g., edge detection 
and linking, maybe followed by the elimination of small gaps and short contour 
pieces and the detection of corners and T-junctions) would yield appropriate 
contour segments, this is not very likely in the presence of textured surfaces and 
background clutter. Indeed, the contours used as input to the pose estimation 
algorithm in Fig. 1 were selected manually [107]. 




Shape Models and Object Recognition 



33 




Fig. 1. Automated modeling and pose estimation: (a) nine views of a (toy) dinosaur; 
(b) the corresponding spline model, Gouraud-shaded and texture-mapped; (c) and (d) 
results of pose estimation. 



34 



Jean Ponce et al. 



The Barney (or maybe T. Rex in this case) vs. generic dinosaur problem 
mentioned earlier is also apparent here: the spline is a purely numerical object 
description, and it is hard to imagine how it would support the recognition of 
object classes. Another possible objection to this approach is the lack of support 
for modelbase indexing, but this may not be that bad of a problem: after all, 
the cost of matching every model to the given contour data is only linear in the 
size 771 of the database (in our example, m = 3, a rather small value). 

All this does not mean that the spline models of Fig. 1 are not useful in 
practice: indeed, they provide a low-cost alternative to using, say, a Cyberware 
rangefinder to construct detailed graphical models of three-dimensional objects. 
Likewise, pose estimates could be used in pick-and-place robotic manipulation 
of isolated^ objects presented on a dark background. However, it is pretty clear 
that this approach does not hold the key to constructing the recognition systems 
of the future. The next section discusses alternatives. 



2.3 What Else Then? 

Numerical/ combinatorial methods use geometric filters to identify a sufficient 
number of matches between image and model features. They include alignment 
techniques (e.g., [46, 62, 89, 112]) and affine and projective invariants (see, for 
example, [26, 115] for early applications of invariants to object recognition, and 
[69, 70] for recent collections of papers). In the former case, matching proceeds 
as a tree search, whose potentially exponential cost is controlled by using the 
fact that very few point matches are in fact sufficient to completely determine 
the object pose and predict the image positions of any further matches (see 
[4, 31, 36] for related work). In the latter case, small groups of points are used 
to directly compute a feature vector independent of the viewpoint (hence the 
name of invariant) that, in turn, can be used for indexing a hash table storing 
all models. An advantage of invariants is that indexing can be achieved in sub- 
linear time. A disadvantage is that groups of three-dimensional points in general 
position do not yield invariants [20, 23, 68] (but see [32, 59, 120] for certain 
object classes that do admit invariants). 

Alignment- and invariant-based approaches to object recognition do not re- 
quire (in principle) a separate grouping stage that constructs a global description 
of the image and/or its features: instead, it is in principle sufficient to construct 
a reliable local feature detector (a much easier task) , since spurious features will 
be rejected by the matching process. On the other hand, as in the case of spline 
models, it is not clear at all how these techniques, with their purely numerical 
characterization of shape, would handle object categories. 

Appearance-based approaches are related to pattern recognition methods. They 
record a description of all images of each object, and they have been successfully 
used in face identification [10, 111] and three-dimensional object recognition 

^ But possibly quite complex (at least by computer vision standards): have another 
look at the gargoyle and the dinosaur in Fig. 1. 




Shape Models and Object Recognition 



35 



[71]. Their main virtue is that, unlike purely geometric approaches to recogni- 
tion, they exploit the great discriminant power of image intensity information. 
However, it is not clear how they would generalize to category-level recognition 
(sec, however, [8] for preliminary efforts in that direction), and they normally 
require (due to their essentially global nature) a separate segmentation step that 
distinguishes the objects of interest from the image background. For example, 
the three-dimensional object recognition system described in [71] uses images 
where isolated objects lie in front of a dark background. Despite its very impres- 
sive performance (real-time recognition of complex free-form, textured objects 
from a single photograph), it is unlikely that this system would perform as well 
with occlusion and background clutter. 

It should be noted that the recognition method proposed in [94] combines 
the advantages of invariant- and appearance-based techniques: it uses quasi- 
invariant properties [15] of the projection process to faithfully represent all 
images of an object by a small set of views, and relies on local descriptions 
{jets, see [54]) of the brightness pattern at interest points [40] for efficient in- 
dexing. Explicit segmentation is avoided, and the results are excellent (e.g., a 
dinosaur (again!) is easily recognized in a picture that contains a very cluttered 
forest background). Like other appearance-based techniques (e.g., [Ill]), this 
one is sensitive to illumination changes, although recent advances in color anal- 
ysis [41, 102] suggest that combining the jets used in the current version of the 
system with local illumination invariants may solve the problem. Again, this 
approach does not address the problem of representing object classes. 

Structural representations. An alternative to the techniques discussed so far is to 
describe objects by part-whole networks of stylized primitives, such as general- 
ized cylinders [13] or superquadrics [5, 77]. This approach has several (potential) 
advantages: first, it offers a natural representation for object classes (similar ob- 
jects will hopefully have similar part-whole decompositions). Second, and maybe 
not quite as obviously, appropriate primitives may prove useful in guiding top- 
down grouping and segmentation: e.g., to look for cylinder- like structures in an 
image, look for pairs of curves that are more or less straight and parallel to each 
other [63]. There are also some known difficulties: for example, the images of 
a particular class of primitive shapes may not have simple properties that can 
readily be exploited by a top-down segmentation process. Another very difficult 
problem is to precisely and operationally define what constitutes an object part. 

A different type of structural representation is what might be called a weak 
modeling scheme, i.e., an approach where only very general assumptions are 
made about the world, and general mathematical results valid under these as- 
sumptions are used to parse images. Aspect graphs [52, 53] are an example of 
this approach: they exploit the fact that both the structure of image contours 
and the manner in which it changes with viewpoint are mathematically well 
understood. Hopefully, similar objects will have similar aspect graphs, and the 
understanding of image structure may serve as a guide to image segmentation. In 
practice however, aspect graphs have not fullfilled their promise, partly because 
of the great difficulty of reliably extracting contour features such as termina- 




36 



Jean Ponce et al. 



tions and T-junctions from real images, and partly because of the fact that even 
relatively simple objects may have extremely complicated aspect graphs (e.g., 
a polyhedron with n faces has an orthographic aspect graph with 0{n^) cells 
[35]; the situation gets even worse when perspective projection [105] and curved 
objects [78] are considered). 

Still, at this point we believe that structural approaches to recognition offer 
the best hope of tackling the segmentation and class representation problems, 
so we will revisit in the next two sections both generalized cylinders and aspect 
graphs (the latter in the context of evolving shape), and discuss some new twists 
that we are currently exploring. Before that, let us stress that we do not claim 
that generalized cylinders or aspect graphs are the way to go. Nor do we claim 
that the results presented in the next two sections are particularly impressive 
or an improvement over existing recognition technology. Rather, we believe that 
it is important to assess what these representation schemes really have to offer 
since we do not know of viable alternatives at this time. 

3 Generalized Cylinders 

Binford introduced generalized cylinders (GCs) in a famous unpublished 1971 
paper [13], defining them in terms of “generalized translational invariance”. 
Roughly speaking, a GC is generated by a one-dimensional set of cross-sections 
that smoothly deform into one another. Binford noted that a space curve form- 
ing the spine of the representation may be defined (but not always uniquely, e.g., 
a cylinder), and that “in general, we don’t expect to have analytic descriptions 
of the cross-section valued function, or the space curve called the spine” [13]. 

Such a definition is very general and quite appealing, but it is also very 
difficult to operationalize: in other words, although a great many objects (the 
fingers of my left hand for example) can certainly be described by sweeping 
smoothly-deforming cross-sections along some space curve, it is not clear at all 
how to construct the description of a given shape in a principled way. 

Most of the early attempts at extracting GC descriptions from images fo- 
cused on range data [1, 74, 103]. Among those, the work of Nevatia and Binford 
[74] is particularly noteworthy since it does implement a version of generalized 
translational invariance: their algorithm tries all possible cross-section orienta- 
tions of objects such as dolls, horses, and snakes, then selects subsets of the 
cross-section candidates with smoothly varying parameters. 

Methods for finding GC instances in video images have traditionally been 
based on the assumption that three-dimensional GCs will project onto two- 
dimensional ones, or ribbons: this is the approach used in the ACRONYM system 
of Brooks and Binford [17, 18] for example. For the limited class of GCs (circular 
cylinders and cones) and ribbons (trapezoids and ellipses) used by ACRONYM’S 
geometric reasoning system,^ there is indeed a natural correspondence between 
GC projections and ribbons, and the assumption is justified. 

^ ACRONYM employed a wider repertoire of primitives for geometric modeling, but 
not to draw inferences from images [17]. 




Shape Models and Object Recognition 



37 



The situation is not as clear for more general GC and ribbon classes, and 
this is probably one of the reasons why, following ACRONYM, most vision sys- 
tems using GCs as their primary shape representation moved toward simpler, 
and better understood, parametric and globally generative GC classes: Shafer 
and Kanade [97, 98] paved the way by introducing a taxonomy of generalized 
cylinders, of which the most commonly used today are probably straight ho- 
mogeneous generalized cylinders (SHGCs, also called generalized cones [44, 66] 
and constructed by scaling an arbitrarily-shaped planar cross-section along a 
straight spine) and solids of revolution (or RCSHGCs in Shafer’s and Kanade’s 
terminology). Another example of a sub-class of GCs is formed by geons, a set 
of twenty four GC types proposed by Biederman in the context of human vision 
[12], which have recently found use in machine vision [11, 25, 75]. 

Limiting the class of GCs under consideration makes it possible to predict 
viewpoint-independent properties of their projections [72, 83, 81]: for example, 
Nalwa proved that the silhouette of a solid of revolution observed under ortho- 
graphic projection is bi-laterally symmetric [72] (i.e, it is an instance of straight 
Brooks ribbon [90]). Ponce, Chelberg and Mann showed that, under both ortho- 
graphic and perspective projection, the tangents to the silhouette of an SHGC 
at points corresponding to the same cross-section intersect on the image of the 
SHGC’s axis [84]. Other important problems (e.g., whether a shape admits a 
unique SHGC description [80]) can also be addressed in this context. 

Such analytical predictions provide a rigorous basis for finding individual GC 
instances in images [45, 84, 86, 93, 113, 118, 119] or recognizing GC instances 
based on projective invariants [59], and, indeed, very impressive results have 
been achieved: for example, the system implemented by Zerroug and Medioni is 
capable of automatically constructing a part-based description of a teapot from 
a real image with clutter and textured background [118]. 

Despite these undeniable successes, we believe that it is necessary to go be- 
yond simple sub-classes of GCs: most objects around us are not made up of 
instances of solids of revolution, SHGCs, canal surfaces, etc.: we do not live in 
a world made of glasses, bottles, cups and teapots. Restricting the application 
domain is useful, but in the end we must interpret scenes that contain the fa- 
miliar objects in my office, stapler, scissors, telephone and people, who may (or 
may not) have elongated parts, but cannot be described in terms of a small set 
of rigid primitives. This is our motivation for introducing a new breed of GCs in 
the next sections. 



3.1 Approach 

The notion of generalized cylinder proposed in this section is based on the intu- 
ition that the elongated parts of a shape are naturally described by sets of cross- 
sections whose area is as small as possible. These cross-sections correspond to 
valleys of a height function that measures the area of all possible cross-sections 
of the shape. The following definition of the topographic notions of valleys (or 
ravines) and ridges was given by Haralick, Watson, and Laffey [37, 39] in the 
context of image processing. 




38 



Jean Ponce et al. 



Definition 1. The valley (resp. ridge) of the surface defined in by a height 
function h : U d IR^ IR over a 2D domain U is the locus of the points 
where the gradient Vh of h is an eigenvector of the Hessian H, and where the 
eigenvalue associated with the other eigenvector of the Hessian is positive (resp. 
negative). 

Other definitions of ridges and valleys have been given by various researchers, 
including Crowley and Parker [24] and Eberly, Gardner, Morse, Pizer and Schar- 
lach [27]. As shown by Koenderink [55], the definitions of Haralick et al. and 
Eberly et al. are in fact equivalent to a much earlier one proposed by de Saint- 
Venant in 1852 [92]. Koenderink’s paper also includes a modern account of the 
different 1912 definition due to Rothe [91] that better captures the intuitive 
notion that water should flow down the valley and not cross the course of a 
river, but unfortunately does not afford a local criterion for detecting ridges and 
valleys. 

We will use the definition of Haralick et al. since, as shown in Section 3.2, it 
can be used to derive a local geometric condition for a cross-section of a shape 
to participate in a ribbon or a generalized cylinder. It also has the advantage of 
having been implemented in several ridge and valley finders [34, 39]. Finally, as 
shown by Barraquand [6, 7] and detailed in Section 3.3, it is readily generalizable 
to higher dimensions, where it still yields one-dimensional valleys and ridges. 
This will prove particularly important for defining GCs. 



3.2 Ribbons 

Gonsider a 2D shape bounded by a curve T defined by x : / ^ IR^ and param- 
eterized by arc length. The line segment joining any two points x(si) and x{s 2 ) 
on r defines a cross-section of the shape, with length l{si, S 2 ) = |a;(si) — x(s 2 )|- 
We can thus reduce the problem of studying the set of cross-sections of our shape 
to the problem of studying the topography of the surface S associated with the 
height function h: P ^ IR^ defined by h{si,S 2 ) = ^l{si,S 2 )^. 

The lowest (resp. highest) points of S correspond to places where the region 
bounded by F is the narrowest (resp. the widest). More interestingly, the valleys 
(resp. ridges) of this surface correspond to narrow (resp. wide) subsets of the 
region, which leads to the following definition of ribbons. 

Definition 2. The ribbon associated with the shape bounded by some curve F is 
the set of cross-sections whose end-points correspond to ridges and valleys of the 
associated surface S. The narrow (resp. widej ribbon is the subset of the ribbon 
corresponding to a valley (resp. ridge) of S. 

At this point, let us make a few remarks: 

• The narrow ribbons are of course of primary interest for implementing two- 
dimensional GGs. However, we will see in the next section that wide ribbons also 
capture interesting shape properties, such as certain symmetries. 

• The description of a given shape is uniquely defined: in particular, it is 
obviously independent of the choice of arc-length origin. In fact, as shown in 




Shape Models and Object Recognition 



39 



[37], the valleys of a surfaee are invariant through monotonie transformations of 
the height function (so replacing h by I for example would not change the ribbon 
associated with a shape). 

• Although this representation superficially “looks like” a skeleton [95] or 
medial axis [16], it is fundamentally different, since a shape is described by a 
set of line segments instead of a set of disks. This is indeed an instance of 2D 
generalized cylinder [13]. 

• As noted earlier, using valleys and ridges to define ribbons allows us to 
construct the ribbon description of a given shape by constructing a discrete 
version of the surface S, then using some implementation of a valley /ridge- finder 
[34, 39] to finds its valleys and ridges. 

Figure 2 shows the narrow ribbons found in synthetic and real images using 
two very simple valley finders that we have developed to conduct preliminary 
experiments. The top of the figure shows results from the first implementation 
with, from left to right: the ribbon of a worm-like object, the mid-points of its 
cross-sections (spine), another synthetic example, and the silhouette of a person 
and the spine of its narrow ribbons. The bottom part of the figure shows results 
from our second program with, from left to right: the height function associated 
with a bottle shape, the ribbon cross-sections, and the associated spine. 





Fig. 2. The narrow ribbons extracted from synthetic and real images. 



Formal properties. We first give a geometric criterion that two points must satisfy 
to define a ribbon pair. Let us parameterize the curve by arc-length, and define 
the unit vectors u, v forming an orthonormal right-handed coordinate system, 
such that xi — X 2 = lu (Fig. 3). We denote by L and rii the unit tangent and 
normals m Xi (i = 1,2), and by 9i denote the angle between the vectors u and 
ti- 







40 



Jean Ponce et al. 




Fig. 3. Notation. 



Lemma 1. The ribbon associated with a two-dimensional shape is the set of 
cross-sections of this shape whose endpoints satisfy 

{cos^9i — cos^ 92)cos{9i — 92) + I cos 9i cos 02 («^i sin 01 + k 2 sin 02 ) = 0. (1) 

This lemma follows from the definition of ridges and valleys and from the 
fact that the gradient V/i is an eigenvector of the Hessian Tt if and only if it is 
a non-zero vector and 

{nS7h) X V/i = 0, (2) 

where “x” denotes the operator associating to two vectors the determinant of 
their coordinates. Rewriting this condition in geometric terms yields (1) after 
some algebraic manipulation. It should be noted that the eigenvalues of the 
Hessian can also be expressed geometrically in terms of the distance I, the angles 
01, 02 and the curvatures k\,K 2 - The formula, however, is a bit complicated and 
not particularly illuminating, and it is omitted here. 

More interestingly. Lemma 1 allows us to compare our ribbons with various 
other types of symmetries. 

Lemma 2. Radial symmetries, bi-lateral symmetries, and worms are ribbons. In 
radial symmetries, concave pairs of symmetric points form narrow ribbons, and 
convex pairs form wide ribbons. Concave pairs of points in bi-lateral symmetries 
always form narrow ribbons. Worms are narrow ribbons. 

Lemma 2 shows that ribbons include some interesting shape classes (worms 
are ribbons obtained by sweeping a line segment with constant width perpen- 
dicular to some generating curve). For example, an ellipse admits the following 
description in terms of ribbons: one narrow ribbon corresponding to its ma- 
jor axis of symmetry, and two wide ribbons corresponding to its minor axis of 
symmetry and its radial symmetry. 

The lemma follows easily from (1), the formula for the Hessian’s eigenvalues 
mentioned earlier, and the angle and curvature properties of worms and radial or 
bi-lateral symmetries [80]. Note that (1) also provides another method for finding 
ribbons: construct a discrete version of the surface S, and find the zero-crossings 
of(l). 




Shape Models and Object Recognition 



41 



Part decomposition. As mentioned earlier, parts can be defined as the pieces of 
an object which are well approximated by some primitive shape [77] , as the pieces 
of an object which are separated by some prototypical discontinuities [43, 85], 
or as a combination of both [48, 101]. Here we follow Binford’s suggestion and 
decompose complex shapes into ribbon parts whose cross-sections vary smoothly, 
so that cross-section area discontinuities delimit separate parts. 

Figure 4 shows examples using real images. The cross-section discontinuities 
are defined as valley endpoints as well as points where the width function is 
discontinuous. 




Fig. 4. Parts. Prom left to right: a hand, its parts, a person and his parts. 



Let us stress again that we do not claim that the results shown in Figs. 2 
and 4 are better than those that would have been obtained by previous methods 
[43, 48, 101]. Nor do we claim at this point to have a robust program for part 
decomposition. Rather, we believe that our preliminary results demonstrate the 
feasibility of the approach and pave the way toward achieving part decomposition 
for three-dimensional objects, as discussed in the next section. 

3.3 Generalized Cylinders 

Definition. The definition of valleys presented in Section 3.1 has been extended 
by Barraquand and his colleagues [6] to arbitrary dimensions, with applications 
in robotic planning [7]. The generalization is as follows: the valley (resp. ridge) 
of a hypersurface defined in IR”^^ by a height function h : U C IR" ^ IR over an 
n-dimensional domain U is the locus of the points where the gradient of h is an 
eigenvector of the Hessian, and where the eigenvalues associated with the other 
eigenvectors of the Hessian are positive (resp. negative). (Mixed eigenvalues yield 
topographic entities lying somewhere between valleys and ridges.) 

Barraquand has shown that valleys are (in general) one-dimensional, inde- 
pendent of the value of n. This is intuitively obvious since the gradient Vfi is 
an eigenvector of the Hessian TC when the vectors Vfi and T-CVh are parallel to 
each other. This can be expressed by a system of n — 1 equations in n unknowns, 
defining a curve in the domain U. 

For example, in the case n = 3, the condition is again (2) where “x” denotes 
this time the cross product. This vector condition yields two independent scalar 
equations in the three domain parameters, hence a curve. 





42 



Jean Ponce et al. 



We now show how the valleys of a height function defined over a three- 
dimensional domain can be used to define generalized cylinders. In the three- 
dimensional case, there is no natural parameterization of the cross-sections of a 
shape in terms of points on its boundary. Instead, we propose the following idea: 
consider a volume V and the three-parameter family of all planes P(si, S 2 , S 3 ) 
in IR^ (we will not worry about the choice of the plane parameterization for the 
time being). We consider the hyper-surface S of IR^ associated with the height 
function h(si, S2, S3) = Area(R n F(si, S2, S3)). 

In other words, h measures the area of the slice of the volume V contained 
in the plane P. The set of valleys of the surface S is as before a one-dimensional 
set, and we can associate with each point in a valley the slice of V associated 
with the corresponding plane. This yields a description of V in terms of planar 
cross-sections swept and deformed in space. 

Definition 3. The generalized cylinder associated with a shape is the set of 
cross-sections corresponding to the valleys of the associated surface S. 

Time for a few more remarks: 

• The definition of a GC given above is not quite satisfying: it depends 
on the choice of plane parameterization used in defining the height function h. 
Changing the parameterization will not merely scale the values of h like the 
monotonic transformations mentioned in Section 3.2: it will deform the domain 
over which the surface S is defined and thus change its valleys. This difficulty 
stems from the fact that there is no obvious natural parameterization similar to 
arc length in the three-dimensional case. 

We propose to use a local parameterization of planes in the neighborhood of 
a given cross-section to guarantee that the GC description attached to a given 
shape is uniquely defined: we attach to each cross-section a reference frame 
whose origin is its center of mass and whose x, y axes are its inertia axes. This 
allows us to parameterize each plane in a neighborhood of the cross-section by 
the spherical coordinates of its normal and the signed distance to the origin. In 
turn, we can use this parameterization and the local criterion (2) to determine 
potential GC cross-sections of an object (of course, the sign of the eigenvalues 
of Pi must also be checked). The GC description of a given shape is obviously 
uniquely defined, independent of any global coordinate system. It is in fact easy 
to show that it is also independent of the particular choice of the coordinate axes 
in the cross-section plane. Note however that the representation (i.e., the height 
field and its valleys) depends on the particular local plane parameterization 
chosen, e.g., we could have used Cartesian coordinates instead of spherical ones 
for the normal. Likewise, we could have used a different origin for the coordinate 
frame associated to each cross-section: the only objective reason for choosing 
the center of mass is that it defines an origin uniquely attached to the shape. 
Clearly, more research is needed in this area, and we will investigate alternate 
plane parameterizations. 

• In the three-dimensional case, we do not make the distinction between 
“narrow” and “wide” generalized cylinders because it is not intuitively clear what 




Shape Models and Object Recognition 



43 



the ridges of S would correspond too. Also, remember that mixed eigenvalue 
signs in the three-dimensional case may give rise to intermediate topographical 
entities whose properties are at this point poorly understood. 

• As in the two-dimensional case, the “spine” of our representation is the 
one-dimensional set of cross-sections, not a particular curve in three-dimensional 
space. If such a curve is desired, the locus of centers of mass of the cross-sections is 
a natural choice, given the parameterization proposed above. This is in agreement 
with Binford’s remarks quoted earlier [13]. 

Where do we stand? We are still at a very early stage of our research for three- 
dimensional shapes. At this point, we have shown the following lemma. 

Lemma 3. Solids of revolution are generalized cylinders. 

The proof is given in [21]. Briefly, it computes analytically the area of the 
cross-section of a solid of revolution by an arbitrary plane, then goes on to show 
that cross-sections orthogonal to the axis direction are valley points. We plan to 
investigate other class of GCs such as circular tubes (three-dimensional worms), 
SHGCs and canal surfaces (the envelopes of one-dimensional families of spheres) . 

There are other properties of our GCs that we plan to investigate in the near 
future: 

• Stability: we plan to explore both theoretically and empirically the sta- 
bility of the representation. In particular, it would be desirable that the GC 
descriptions of two similar shapes also be similar. Characterizing the stability 
of any shape representation scheme is difficult (defining shape similarity is in 
itself a non-trivial problem). An instance of this problem is whether the GC 
description of a shape that can be approximated by a simple GC class (e.g., a 
solid of revolution) has a GC description similar to an instance of that class (in 
the solid of revolution example, roughly circular cross-sections swept along an 
almost straight line). As noted several times before, this is very important since, 
except in very special situations, real objects and/or their parts will not be exact 
instances of simple primitive shapes [42] . 

• Projective properties: do our generalized cylinders project onto ribbons? 
This is at present an open question, and we will try to give it a rigorous answer. 
Understanding the stability of the GC representation will provide partial answers 
(e.g., solids of revolution project to bi-lateral symmetries under weak perspective 
[72]). We will of course try to articulate more complete ones. 

We have implememted a simple program that uses the marching cube al- 
gorithm [61] to detect valleys in a three-dimensional terrain, and Fig. 5 shows 
very preliminary experiments. Note that we have not implemented the local pa- 
rameterization of GCs, and our program uses a global plane parameterization 
instead. Much more work is of course needed in this area. 

Once we have developed robust algorithms for constructing GC descriptions 
of complex shapes, we will attack the part-decomposition problem using an ap- 
proach similar to the one developed in the two-dimensional case. More generally, 
the ultimate goal of this project is of course to develop methods using part-whole 




44 



Jean Ponce et al. 



/ 

Fig. 5. Some simple shapes and their GC description. 

hierarchies describing objects and object classes for efficient indexing in object 
recognition [14, 74]. 

Another interesting issue is how to introduce a notion of scale in the repre- 
sentation. This would allows us to maintain hierarchies of part decompositions 
potentially useful in coarse-to-fine matching. At this point this remains an open 
problem. A different application of scale-space techniques to shape representa- 
tion is considered in the next section. 

4 Evolving Surfaces 

Imagine a smooth density function defined over a volume. The set of points 
where the density exceeds a given threshold defines a solid shape whose surface 
is a level set of the density. Blurring the density function will change its level 
sets and the shape it defines. When do “important” changes occur? This is the 
question addressed by Koenderink in the section dedicated to dynamic shape of 
his book [51], where he goes on to give several examples of structural changes 
under the name of morphological scripts. Recently, Bruce, Giblin and Tari [19] 
have given a complete classification of the structural changes in the parabolic set 
of a smooth surface undergoing a one-parameter family of deformations, as well 
as geometric and analytical conditions under which these transitions happen. 
We address in this section the problem of actually computing these transitions. 

4.1 Background 

The idea of capturing the significant changes in the structure of a geometric 
pattern as this pattern evolves under some family of deformations has a long 
history in computer vision. For example, Marr [65] advocated constructing the 
primal sketch representation of an image by keeping track of the patterns that do 
not change as the image is blurred at a variety of scales. The idea of recording 
the image changes under blurring lead to the scale-space approach to image 
analysis, which was first proposed by Witkin [117] in the case of inflections of a 
one-dimensional signal, and has since been applied to many domains, including 
the evolution of curvature patterns on curves [3, 58, 64] and surfaces [82], and 
more recently, the evolution of curves and level sets under diffusion processes 
[49, 96]. 





Shape Models and Object Recognition 



45 



Koenderink’s and Van Doom’s aspect graphs [52, 53] provide another exam- 
ple where a geometric pattern can be characterized by its significant changes. 
This time, the objective is to characterize the possible appearances of an object. 
The range of viewpoints is partitioned into maximal regions where the (qualita- 
tive) appearance of an object remains the same, separated by critical boundaries 
where the structure of the silhouette changes according to some visual event. The 
maximal regions (labeled by the object appearance at some sample point) are 
the nodes of the aspect graph, and the visual events separating them are its arcs. 
For smooth surfaces, a complete catalogue of visual events is available from sin- 
gularity theory [2, 52, 53, 87], and it has been used to compute the aspect graphs 
of surfaces of revolution [28, 56], algebraic surfaces [79, 88], and empirical sur- 
faces defined as the level set of volumetric density functions [76] (see [109] for 
related work). 

In his book [51], Koenderink addressed the problem of understanding the 
structural changes of the latter type of surfaces as the density function undergoes 
a diffusion process. He focused on the evolution of certain surface attributes, 
namely, the parabolic curves and their images via the Gauss map, which are 
significant for vision applications: for example, the intersection of a parabolic 
curve with the occluding contour of an object yields an inflection of the silhouette 
[50], and the asymptotic directions along the parabolic curves form the lip and 
beak-to-beak events of the aspect graph [47]. Koenderink proposed to define 
morphological scripts that record the possible transformations of a given shape 
and use these as a language for describing dynamic shape. Bruce, Giblin and Tari 
[19] have used singularity theory to expand Koenderink’s work and establish a 
complete catalogue of the singularities of the parabolic set under one-parameter 
families of deformations. We recall their results before presenting an approach 
to computing the critical events that they have identified. Once these events 
have been computed, we characterize the structure of the parabolic set and its 
Gaussian image at a sample point between each pair of critical events, which 
yields a data structure similar to the aspect graph but parameterized by time 
instead of viewpoint. 



4.2 Singularities of Evolving Surfaces 

This section introduces the parabolic set, its Gaussian image, and their singular- 
ities in the context of contact between planes and surfaces. It also summarizes 
the results of Bruce, Giblin and Tari [19]. 

Generic singularities. The intersection of a surface with the tangent plane at 
one of its points can take several forms (Fig. 6): for most points, this intersection 
is either reduced to the point itself (this is the case for elliptic points, where the 
surface has locally the shape of an ovoid or the inside of an egg shell, see Fig. 
6(a)) or composed of two curve branches intersecting transversally (this happens 
at hyperbolic points, where the surface has locally the shape of a saddle, see Fig. 
6(b)). Elsewhere, the intersection may consist of a curve that cusps at the point 
of interest (which is then said to be parabolic, see Fig. 6(c)), a unode (a double 




46 



Jean Ponce et al. 



extremum of the height funetion measured along the surfaee normal, see Fig. 
6(d)), or a tacnode (two smooth eurve branches joining tangentially, see Fig. 
6(e)). Points corresponding to the last two eases are ealled godrons, ruffles [51], 
or cusps of the Gauss map, beeause the curve traced on the Gauss sphere by the 
unit surface normals along the parabolic curve has a cusp at these points. 

For generic surfaces, there are no other possibilities; elliptie and hyperbolic 
points form extended areas of the surface, separated by smooth curves made 
of parabolic points; the cusps of Gauss are isolated points on these parabolic 
curves. 







Fig. 6. Contact of a surface with its tangent plane: (a) an elliptic point, (b) a hyper- 
bolic point, (c) a parabolic point, (d) a unode, (e) a tacnode. 



Singularities of one-parameter families of surfaces. As shown in [19], there are a 
few more possibilities in the case of one-parameter families of deforming surfaces: 
indeed, three types of higher-order contact may occur at isolated values of the 
parameter eontrolling the deformation. The corresponding singularities are ealled 
A3, A4 and D4 transitions following Arnold’s conventions [2], and they affect 
the strueture of the parabolic set as well as its image under the Gauss map (Fig. 

7 ). 

There are four types of A3 transitions: in the first case, a parabolic loop 
disappears from the surface, and an assoeiated loop with two cusps disappears 
on the Gauss sphere (this corresponds to a lip event in catastrophe theory jargon, 
see Fig. 7(a)). In the second case, two smooth parabolic branches join, then split 
again into two branches with a different connectivity, while two cusping branches 
on the Gauss sphere merge then split again into two smooth branches (this is a 
beak-to-beak event, see Fig. 7(b)). Two additional singularities are obtained by 
reversing these transitions. 

At an A4 event, the parabolic curve remains smooth but its Gaussian image 
undergoes a swallowtail transition, i.e., it acquires a higher-order singularity that 
breaks off into two cusps and a crossing (Fig. 7(c)). Again, the transition may 
be reversed. Finally, there are four D4 transitions. In the first one (Fig. 7(d)), 
two branches of a parabolic curve meet then split again immediately; a similar 
phenomenon occurs on the Gauss sphere, with a cusp of Gauss “jumping” from 
one branch to the other. In the second transition (Fig. 7(e)), a parabolic loop 
shrinks to a point then expands again into a loop; on the Gauss sphere, a loop 




Shape Models and Object Recognition 



47 



Parabolic 

0 


Set 

• 




Gaussian 

X 


Image 

• 




X 


X 




/X 


X 




vy 


\y 


X 


V 


V 


V 


>C 


X 


X 


V 


V 


X 


0 


• 


0 


>• 


• 


•< 



Fig. 7. The singularities of evolving surfaces. The events are shown on both the surface 
(left) and the Gauss sphere (right). The actual events are shown as black disks, and 
the (generic) cusps of Gauss are shown as white disks. See text for more details. 



with three cusps shrinks to a point then reappears. The transitions can as usual 
be reversed. 



4.3 Computing the Singularities 

We now present an approach to computing the singularities of evolving algebraic 
surfaces. We first derive the equations defining these singularities, then propose 
an algorithm for solving these equations and finally present results from our 
implementation. 



Singularity equations. Bruce, Giblin and Tari [19] give explicit equations for all 
the singularities in the case of a surface defined by a height function 2 : = f{x, y). 
Here we are interested in surfaces defined by some implicit equation F{x, y, z) = 
0. Although it is possible to use the chain rule to rederive the corresponding 
equations in this case, this is a complex process [109] and we have chosen instead 
to first construct the equations characterizing the parabolic curves and the cusps 
of Gauss, which turn out to have a very simple form, then exploit the fact that 
the parabolic set or its image under the Gauss map are singular at critical events 
to characterize these events. 

The normal to the surface defined by F{x, y,z) = 0 is of course given by the 
gradient VF = {Fx, Fy, Fz)'^ . As shown in [79, 109, 114], the parabolic curves 
are defined by F{x,y,z) = P{x,y,z) = 0, where where P = VF'^AVF and A 
denotes the symmetric matrix 



F F — F 

yy-^- zz ^ yz 



P P — P P P P — P P ' 

xz-‘- yz ^ zz^ xy xy-‘- yz ^ yy-*- xz 

F F — F^ 

J- ZZ-L XX -L XZ 



^ FxyFyz FyyFxZ FxyFxZ FxxFyz FxxFyy Fxy 




48 



Jean Ponce et al. 



(Note that A7i = |'H|Id, where T-L denotes the Hessian matrix associated to 

F.) 

The cusps of Gauss are points where the asymptotic direction along the 
parabolic curve is tangent to this curve [51]. Let us show that the asymptotic 
direction at a parabolic point is a = AVF. Asymptotic tangents are vectors of 
the tangent plane that are self-conjugate. The first condition is obviously satisfied 
at a parabolic point since a • VF ~ VF'^AVF = 0. The second condition is also 
obviously satisfied since a^Jfa = \H\{a. ■ VF) = 0. Since the tangent to the 
parabolic curve is given by VP x VF, it follows that the cusps of Gauss are 
given by P = P = C = 0, where C = VP^AVF. 

Note that VP has a relatively simple form: 

g /VP^AVP\ 

VP= — (VP^ylVP) = VF'^AyVF +2\n\VF, 

\VF'^A,VF ) 

since (^VP^)^ = HA = 061(7^)^. In particular, this simplifies the expression 
of C since the VP term above cancels in the dot product. Similar simplifications 
occur during the computation of the non-generic singularities below. 

We are now ready to characterize these singularities. Note that in the case of 
a surface undergoing a family of deformations parameterized by some variable t, 
P, P and C are also functions of t. Let us first consider A3 and D4 transitions. 
Since they yield singular parabolic curves, they must satisfy P = P = 0 and 

VxP(x, y, z, t) X V^P{x, y, z, t) = 0, 

where Vx denotes the gradient operator with respect to x, and the third equa- 
tion simply states that the normals to the original surface and the “parabolic” 
surface defined by P = 0 are parallel. This is a vector equation with three scalar 
components, but only two of these components are linearly independent. It fol- 
lows that the singularities of the parabolic set are characterized by four equations 
in four unknowns. 

The case of A4 singularities is a little more complicated since the parabolic 
set is smooth there. On the other hand, the curve defined in by the cusps 
of Gauss is singular, and the A4 singularities can thus be found by solving the 
system of equations defined byP = P= C' = 0 and 

|VxP,VxP,VxC’| =0. 

Solving the equations. Our objectives are to compute all the critical events and 
characterize the structure of the parabolic set and its Gaussian image at a sample 
point between each pair of critical events. This yields a data structure similar 
to the aspect graph but parameterized by time instead of viewpoint. 

All critical events are characterized by systems of four polynomial equations 
in four unknowns. They can thus be found using homotopy continuation [67], a 
global root finder that finds all the solutions (real or complex) of square systems 
of polynomial equations. 




Shape Models and Object Recognition 



49 



Between singularities, the structure of the parabolic set does not change, 
and the curve tracing algorithm proposed in [57, 79] is used to determine its 
structure. Briefly, the algorithm traces an algebraic curve by first using homo- 
topy continuation to find all its extremal points (including singularities) in some 
direction, as well as sample points on the smooth branches bounded by those 
extrema, then marching numerically from the samples to the adjacent extrema. 
See [57, 79] for details. 

We have implemented this approach. All algebraic manipulations, including 
the computation of the result of Gaussian diffusion and linear morphing pro- 
cesses and the derivation of the singularity equations have been implemented in 
Mathematica. The singularities are computed using a parallel implementation 
of homotopy continuation [104] which allows the construction of the continua- 
tion paths to be distributed among a network of Unix workstations. The curve 
tracing algorithm described in [57, 79] has been implemented in Mathematica. 

Figure 8(a) shows the results of applying Gaussian diffusion to a dimple- 
shaped quartic surface defined by 

(4x^ + 3y2)2 _ 4 j ;2 _ 5y2 + 4^2 _ ^ ^ 

The surface and its parabolic curves are shown in the first row of the figure, 
and the corresponding Gaussian image is shown in the second one. As before, 
generic cusps of Gauss are shown as white discs. Singularities occur in the second, 
fourth and sixth column. There is a last singularity which is not shown in the 
figure, and that corresponds to the disappearance of the surface. 

Figure 8(b) shows the singularities found when linearly morphing the dimple 
into a squash-shaped surface defined by 

Ay'^ + 3xy^ — 5y^ + Az^ + — 2xy + 2x + 3y — 1 = 0. 

The evolving surface is shown in the first two rows, and its Gaussian image in 
the two bootom ones. As in the previous case, we have not found any A4 event. 



4.4 Toward a Scale-Space Aspect Graph 

We show in this section preliminary results in an effort to characterize the change 
in visual appearance of an object as it undergoes a diffusion process. Approaches 
to the construction of such a scale-space aspect graph can be found in [29] in the 
two-dimensional polygonal case and [99] in the three-dimensional polyhedral 
case. Here we attack the case of curved surfaces formed by the zero set of poly- 
nomial density functions, focusing on the case of solids of revolution. Equations 
for the visual events of solids of revolution can be found in [28, 56]. These visual 
events form parallels of constant latitude on the unit sphere. When the diffusion 
parameter is added, the events transform into curves in (ct, /?) space, where /3 
denotes the angle between the axis of the solid of revolution and the viewing 
direction. 

Computing the singularities of these curves for a sample solid of revolution 
yields the scale-space aspect graph shown in Fig. 9. Note that multi-local visual 




50 



Jean Ponce et al. 





Fig. 8. Evolving shapes; (a) diffusion of a dimple; (b) morphing the dimple into a 
squash. 



events have not been traced yet, so aspects that only differ through T-junctions 
are considered equivalent. Figure 9(top) shows the regions of the {a, (3) plane 
delimited by visual events. There are two “diffusion events” (dashed vertical lines 
in the figure) in this case: the first one corresponds to the surface splitting into 
two parts, and the second one corresponds to the disappearance of its parabolic 
lines. A close-up of the diagram near these two events is shown in Fig. 9(top- 
right).^ Note that the visual event curve with a vertical inflection switches from 
beak-to-beak to lip at the inflection. 

The two diffusion events separate three qualitatively different aspect graphs. 
The aspects and visual events corresponding to the first one are shown in Fig. 
9 (middle). Close-ups of the aspects and visual events associated with the second 
one are shown in Fig. 9 (bottom). These are centered at the corresponding sin- 

® There are in fact two more diffusion events that occur for larger values of a and are 
not shown in the figure. They correspond to the disappearance of the two parts of 
the surface after splitting. 






(E) (F) (G) (H) 




(AB) (BC) (CD) 

Fig. 9. Scale-space aspect graph. See text for details. 



gularities for clarity. Ignoring multi-local events, there is only one aspect for the 
third aspect graph, and it is not shown in the figure. 








52 



Jean Ponce et al. 



As in the case of generalized cylinders, much more work is needed here: this 
includes computing the multi-local visual events, generalizing our approach to 
algebraic surfaces that do not bound solids of revolution, and linking the changes 
in aspect graph structure to the singularities of evolving parabolic and flecnodal 
curves. An even more challenging problem is to consider the effect of diffusion 
processes applied to the image instead of the surface, which is a much more 
realistic model of optical blur. A lot is left to learn. 

References 

[1] G.J. Agin. Representation and description of curved objects. PhD thesis, Stan- 
ford University, Stanford, CA, 1972. 

[2] V.I. Arnol’d. Singularities of smooth mappings. Russian Math. Surveys, pages 
3-44, 1969. 

[3] H. Asada and J.M. Brady. The curvature primal sketch. IEEE Trans. Patt. 
Anal. Mach. Intel!, 8(1):2-14, 1986. 

[4] N. Ayache and O. Faugeras. Hyper: a new approach for the recognition and 
positioning of two-dimensional objects. IEEE Trans. Patt. Anal. Mach. Intel!, 
8(l):44-54, January 1986. 

[5] A.H. Barr. Superquadrics and angle preserving transformations. IEEE Computer 
Graphics and Applications, 1:11-23, January 1981. 

[6] J. Barraquand and M. Berthod. A non-linear second order edge detector. In 
International Image Week, Second Image Symposium, Nice, France, 1986. 

[7] J. Barraquand, B. Langlois, and J.C. Latombe. Robot motion planning with 
many degrees of freedom and dynamic constraints. In Int. Symp. on Robotics 
Research, pages 74-83, Tokyo, Japan, 1989. Preprints. 

[8] R. Basri, D. Roth, and D. Jacobs. Clustering apperance of 3D objects. In Proc. 
IEEE Conf. Comp. Vision Patt. Recog., pages 414-420, Santa Barbara, CA, June 
1998. 

[9] B.C. Baumgart. Geometric modeling for computer vision. Technical Report 
AIM-249, Stanford University, 1974. Ph.D. Thesis. Department of Computer 
Science. 

[10] P.N. Belhumeur, J.P. Hesphanha, and D.J. Kriegman. Eigengaces vs. Fisherfaces: 
recognition using class-specific linear projection. IEEE Trans. Patt. Ana! Mach. 
Intel!, 19(7):711-720, 1997. 

[11] R. Bergevin and M. Levine. Part decomposition of objects from single view line 
drawings. CVGIP: Image Understanding, 55(3):17-34, 1992. 

[12] I. Biederman. Human image understanding: Recent research and a theory. Comp. 
Vis. Graph. Im. Proc., 32(l):29-73, 1985. 

[13] T.O. Binford. Visual perception by computer. In Proc. IEEE Conference on 
Systems and Control, 1971. 

[14] T.O. Binford. Body-centered representation and recognition. In M. Hebert, 
J. Ponce, T.E. Boult, and A. Gross, editors. Object Representation for Com- 
puter Vision, number 994 in Lecture Notes in Computer Science, pages 207-215. 
Springer- Verlag, 1995. 

[15] T.O. Binford and T.S. Levitt. Quasi-invariants: theory and applications. In 
Proc. DARPA Image Understanding Workshop, pages 819-829, 1993. 




Shape Models and Object Recognition 



53 



[16] H. Blum. A transformation for extracting new descriptors of shape. in 
W. Wathen-Dunn, editor, Models for perception of speech and visual form. MIT 
Press, Cambridge, MA, 1967. 

[17] R.A. Brooks. Symbolic reasoning among 3-D models and 2-D images. Artificial 
Intelligence, 17(l-3):285-348, 1981. 

[18] R.A. Brooks, R. Greiner, and T.O. Binford. Model-based three-dimensional in- 
terpretation of two-dimensional images. In Proc. International Joint Conference 
on Artificial Intelligence, pages 105-113, Tokyo, Japan, Aug. 1979. 

[19] J.W. Bruce, P.J. Giblin, and F. Tari. Parabolic curves of evolving surfaces. Int. 
J. of Comp. Vision, 17(3):291-306, 1996. 

[20] B. Burns, R. Weiss, and E. Riseman. The non-existence of general-case view- 
invariants. In Geometric Invariance in Computer Vision, pages 120-131. MIT 
Press, 1992. 

[21] M. Cepeda. Generalized cylinders revisited: theoretical results and preliminary 
implementation. Master’s thesis, Department of Gomputer Science, University 
of Illinois at Urbana-Champaign, 1998. 

[22] B. Ghiyokura and F. Kimura. Design of solids with free-form surfaces. Computer 
Graphics, 17(3):289-298, Nov. 1983. 

[23] D.T. Clemens and D.W. Jacobs. Model group indexing for recognition. IEEE 
Trans. Patt. Anal. Mach. Intel!, 13(10):1007-1017, 1991. 

[24] J.L Crowley and A.C. Parker. A representation of shape based on peaks and 
ridges in the dillerence of low-pass transform. IEEE Trans. Patt. Anal Mach. 
Intel!, 6:156-170, 1984. 

[25] S. Dickinson, A.P Pentland, and A. Rosenfeld. 3D shape recovery using dis- 
tributed aspect matching. IEEE Trans. Patt. Ana! Mach. Intel!, 14(2): 174-198, 
1992. 

[26] R. Duda and P. Hart. Pattern classification and scene analysis. Wiley, 1973. 

[27] G. Eberly, R. Gardner, B. Morse, S. Pizer, and C. Scharlach. Ridges for image 
analysis. Technical report, Univ. of North Carolina, Dept, of Comp. Sc., 1993. 

[28] D. Eggert and K. Bowyer. Computing the orthographic projection aspect graph 
of solids of revolution. In Proc. IEEE Workshop on Interpretation of 3D Scenes, 
pages 102-108, Austin, TX, November 1989. 

[29] D. Eggert, K. Bowyer, C. Dyer, H. Christensen, and D. Goldgof. The scale space 
aspect graph. IEEE Trans. Patt. Ana! Mach. Intel!, 15(11) :1114-1130, 1993. 

[30] G. Farin. Curves and Surfaces for Computer Aided Geometric Design. Academic 
Press, San Diego, GA, 1993. 

[31] O.D. Faugeras and M. Hebert. The representation, recognition, and locating of 
3-D objects. International Journal of Robotics Research, 5(3):27-52, Fall 1986. 

[32] D. Forsyth. Recognizing an algebraic surface from its outline. Int. J. of Comp. 
Vision, 18(l):21-40, April 1996. 

[33] D.A. Forsyth and M.M. Fleck. Body plans. In Proc. IEEE Conf. Comp. Vision 
Patt. Recog., pages 678-683, San Juan, PR, June 1997. 

[34] J.M. Gauch and S.M. Pizer. Multiresolution analysis of ridges and valleys in 
grey-scale images. IEEE Trans. Patt. Ana! Mach. Intel!, 15(6), June 1993. 

[35] Z. Gigus, J. Canny, and R. Seidel. Efficiently computing and representing aspect 
graphs of polyhedral objects. IEEE Trans. Patt. Ana! Mach. Intel!, 13(6), June 
1991. 

[36] W.E.L. Crimson and T. Lozano-Perez. Localizing overlapping parts by searching 
the interpretation tree. IEEE Trans. Patt. Ana! Mach. Intel!, 9(4):469-482, 
1987. 




54 



Jean Ponce et al. 



[37] R.M. Haralick. Ridges and valleys in digital images. Comp. Vis. Graph. Im. 
Proc., 22:28-38, 1983. 

[38] R.M. Haralick and L.G. Shapiro. Computer and robot vision. Addison Wesley, 
1992. 

[39] R.M. Haralick, L.T. Watson, and T.J. Laffey. The topographic primal sketch. 
International Journal of Robotics Research, 2:50-72, 1983. 

[40] C. Harris and M. Stephens. A combined edge and corner detector. In 4*** Alvey 
Vision Conference, pages 189-192, Manchester, UK, 1988. 

[41] G. Healey and D. Slater. Global color constancy. J. Opt. Soc. Am. A, 
11(11):3003-3010, 1994. 

[42] M. Hebert, J. Ponce, T.E. Boult, A. Gross, and D. Forsyth. Report on the 
NSF/ARPA workshop on 3D object representation for computer vision. In 
M. Hebert, J. Ponce, T.E. Boult, and A. Gross, editors. Object Representa- 
tion for Computer Vision, Lecture Notes in Computer Science. Springer- Verlag, 
1995. Also available through the World-Wide Web at the following address: 
/ / www.ius.cs.cmu.edu / usr / users /hebert / www / workshop /report .html. 

[43] D.D. Hoffman and W. Richards. Parts of recognition. Cognition, 18:65-96, 1984. 

[44] J. Hollerbach. Hierarchical shape description of objects by selection and modi- 
fication of prototypes. Al Lab. TR-346, MIT, 1975. 

[45] R. Horaud and J.M. Brady. On the geometric interpretation of image contours. 
In Proc. Int. Conf. Comp. Vision, London, U.K., June 1987. 

[46] D.P. Huttenlocher and S. Ullman. Object recognition using alignment. In Proc. 
Int. Conf. Comp. Vision, pages 102-111, London, U.K., June 1987. 

[47] Y.L. Kergosien. La famille des projections orthogonales d’une surface et ses 
singularites. C.R. Acad. Sc. Paris, 292:929-932, 1981. 

[48] B.B. Kimia, A.R. Tannenbaum, and S.W. Zucker. The shape triangle: parts, 
protrusions and bends. Technical Report TR-92-15, McGill University Research 
Center for Intelligent Machines, 1992. 

[49] B.B. Kimia, A.R. Tannenbaum, and S.W. Zucker. Shapes, shocks, and defor- 
mations 1: the components of shape and the reaction-diffusion space. Int. J. of 
Comp. Vision, 15:189-224, 1995. 

[50] J.J. Koenderink. What does the occluding contour tell us about solid shape? 
Perception, 13:321-330, 1984. 

[51] J.J. Koenderink. Solid Shape. MIT Press, Cambridge, MA, 1990. 

[52] J.J. Koenderink and A.J. Van Doom. The singularities of the visual mapping. 
Biological Cybernetics, 24:51-59, 1976. 

[53] J.J. Koenderink and A.J. Van Doom. The internal representation of solid shape 
with respect to vision. Biological Cybernetics, 32:211-216, 1979. 

[54] J.J. Koenderink and A.J. Van Doom. Representation of local geometry in the 
visual system. Biological Cybernetics, 55:367-375, 1987. 

[55] J.J. Koenderink and A.J. Van Doom. Local features of smooth shapes: Ridges 
and courses. In Geometric Methods in Computer Vision II, pages 2-13, 1993. 

[56] D.J. Kriegman and J. Ponce. Computing exact aspect graphs of curved objects: 
solids of revolution. Int. J. of Comp. Vision, 5(2):119-135, 1990. 

[57] D.J. Kriegman and J. Ponce. A new curve tracing algorithm and some applica- 
tions. In P.J. Laurent, A. Le Mehaute, and L.L. Schumaker, editors. Curves and 
Surfaces, pages 267-270. Academic Press, New York, 1991. 

[58] M. Leyton. A process grammar for shape. Artificial Intelligence, 34:213-247, 
1988. 




Shape Models and Object Recognition 



55 



[59] J. Liu, J.L. Mundy, D. Forsyth, A. Zisserman, and C. Rothwell. Efficient recog- 
nition of rotationally symmetric surfaces and straight homogeneous generalized 
cylinders. In Proc. IEEE Conf. Comp. Vision Patt. Recog. , pages 123-128, New 
York City, NY, 1993. 

[60] C. Loop. Smooth spline surfaces over irregular meshes. Computer Graphics, 
pages 303-310, Aug. 1994. 

[61] W. Lorensen and H. Cline. Marching cubes: a high resolution 3D surface con- 
struction algorithm. Computer Graphics, 21:163-169, 1987. 

[62] D. Lowe. The viewpoint consistency constraint. Int. J. of Comp. Vision, 1(1):57- 
72, 1987. 

[63] D. Lowe and T.O. Binford. Segmentation and aggregation: An approach to 
figure-ground phenomena. In Proc. DARPA Image Understanding Workshop, 
pages 168-178, 1982. 

[64] A. Mackworth and F. Mokhtarian. The renormalized curvature scale space and 
the evolution properties of planar curves. In Proc. IEEE Conf. Comp. Vision 
Patt. Recog., pages 318-326, 1988. 

[65] D. Marr. Vision. Freeman, San Francisco, 1982. 

[66] D. Marr and K. Nishihara. Representation and recognition of the spatial organi- 
zation of three-dimensional shapes. Proc. Royal Society, London, B-200:269-294, 
1978. 

[67] A.P. Morgan. Solving Polynomial Systems using Continuation for Engineering 
and Scientific Problems. Prentice Hall, Englewood Cliffs, NJ, 1987. 

[68] Y. Moses and S. Ullman. Limitations of non model-based recognition schemes. 
In Proc. European Conf. Comp. Vision, pages 820-828, 1992. 

[69] J.L. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT 
Press, Cambridge, Mass., 1992. 

[70] J.L. Mundy, A. Zisserman, and D. Forsyth. Applications of Invariance in Com- 
puter Vision, volume 825 of Lecture Notes in Computer Science. Springer- Verlag, 
1994. 

[71] H. Murase and S. Nayar. Visual learning and recognition of 3D objects from 
appearance. Int. J. of Comp. Vision, 14(l):5-24, 1995. 

[72] V.S. Nalwa. Line-drawing interpretation: bilateral symmetry. In Proc. DARPA 
Image Understanding Workshop, pages 956-967, Los Angeles, CA, February 
1987. 

[73] V.S. Nalwa. A guided tour of computer vision. Addison Wesley, 1993. 

[74] R. Nevatia and T.O. Binford. Description and recognition of complex curved 
objects. Artificial Intelligence, 8:77-98, 1977. 

[75] Quang-Loc Nguyen and M.D. Levine. Representing 3D objects in range images 
using geons. Computer Vision and Image Understanding, 63(1):158-168, January 
1996. 

[76] A. Noble, D. Wilson, and J. Ponce. On Computing Aspect Graphs of Smooth 
Shapes from Volumetric Data. Computer Vision and Image Understanding: spe- 
cial issue on Mathematical Methods in Biomedical Image Analysis, 66(2):179- 
192, 1997. 

[77] A.P. Pentland. Perceptual organization and the representation of natural form. 
Artificial Intelligence, 28:293-331, 1986. 

[78] S. Petitjean. Geometric enumerative et contacts de varietes lineaires: application 
aux graphes d’ aspects d’ohjets courbes. PhD thesis, Institut National Polytech- 
nique de Lorraine, 1995. 

[79] S. Petitjean, J. Ponce, and D.J. Kriegman. Computing exact aspect graphs of 
curved objects: Algebraic surfaces. Int. J. of Comp. Vision, 9(3):231-255, 1992. 




56 



Jean Ponce et al. 



[80] J. Ponce. On characterizing ribbons and finding skewed symmetries. Comp. Vis. 
Graph. Im. Proc., 52:328-340, 1990. 

[81] J. Ponce. Straight homogeneous generalized cylinders: differential geometry and 
uniqueness results. Int. J. of Comp. Vision, 4(1):79-100, 1990. 

[82] J. Ponce and J.M. Brady. Toward a surface primal sketch. In T. Kanade, editor, 
Three-dimensional machine vision, pages 195-240. Kluwer Publishers, 1987. 

[83] J. Ponce and D. Chelberg. Finding the limbs and cusps of generalized cylinders. 
Int. J. of Comp. Vision, 1(3): 195-210, October 1987. 

[84] J. Ponce, D. Chelberg, and W. Mann. Invariant properties of straight homoge- 
neous generalized cylinders and their contours. IEEE Trans. Patt. Anal. Mach. 
Intel!, ll(9):951-966, September 1989. 

[85] W. Richards, J.J. Koenderink, and D.D. Hoffman. Inferring 3D shapes from 2D 
codons. MIT AI Memo 840, MIT Artificial Intelligence Lab, 1985. 

[86] M. Richetin, M. Dhome, J.T. Lapreste, and G. Rives. Inverse perspective trans- 
form from zero-curvature curve points: Application to the localization of some 
generalized cylinders from a single view. IEEE Trans. Patt. Anal. Mach. Intel!, 
13(2):185-191, February 1991. 

[87] J.H. Rieger. On the classification of views of piecewise-smooth objects. Image 
and Vision Computing, 5:91-97, 1987. 

[88] J.H. Rieger. Global bifurcations sets and stable projections of non-singular al- 
gebraic surfaces. Int. J. of Comp. Vision, 7(3):171-194, 1992. 

[89] L.G. Roberts. Machine perception of three-dimensional solids. In J.T. Tippett 
et ah, editor. Optical and Electro- Optical Information Processing, pages 159-197. 
MIT Press, Gambridge, 1965. 

[90] A. Rosenfeld. Axial representations of shape. Comp. Vis. Graph. Im. Proc., 
33:156-173, 1986. 

[91] R. Rothe. Darstellende Geometrie des Gel'andes. 1914. 

[92] De Saint-Venant. Surfaces a plus grande pente constituees sur des lignes courbes. 
Bulletin de la soc. philomath, de Paris, March 1852. 

[93] H. Sato and T.O. Binford. Finding and recovering SHGC objects in an edge 
image. CVGIP: Image Understanding, 57:346-358, 1993. 

[94] G. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. IEEE 
Trans. Patt. Ana! Mach. Intel!, 19(5):530-535, May 1997. 

[95] J. Serra. Image Analysis and Mathematical Morphology. Academic Press, New 
York, 1982. 

[96] J.A. Sethian. Level set methods: evolving interfaces in geometry, fluid mechanics, 
computer vision and materials sciences. Cambridge University Press, 1996. 

[97] S.A. Shafer. Shadows and Silhouettes in Computer Vision. Kluwer Academic 
Publishers, 1985. 

[98] S.A. Shafer and T. Kanade. The theory of straight homogeneous generalized 
cylinders and a taxonomy of generalized cylinders. Technical Report CMU-CS- 
83-105, Carnegie-Mellon University, 1983. 

[99] I. Shimshoni and J. Ponce. Finite- resolution aspect graphs of polyhedral objects. 
IEEE Trans. Patt. Ana! Mach. Intel!, 19(4):315-327, 1997. 

[100] L. Shirman and C. Sequin. Local surface interpolation with Bezier patches. 
CAGE, 4:279-295, 1987. 

[101] K. Siddiqi and B.B. Kimia. Parts of visual form: computational aspects. IEEE 
Trans. Patt. Ana! Mach. Intel!, 17(3):239-251, March 1995. 

[102] D. Slater and G. Healey. Recognizing 3-d objects using local color invariants. 
IEEE Trans. Patt. Ana! Mach. Intel!, 18(2):206-210, 1996. 




Shape Models and Object Recognition 



57 



[103] B.i Soroka and R. Bajcsy. Generalized cylinders from cross-sections. In Third 
Int. J. Conf. Patt. Recog., pages 734-735, 1976. 

[104] D. Stam. Distributed homotopy continuation and its application to robotic 
grasping. Master’s thesis, University of Illinois at Urbana-Champaign, 1992. 
Beckman Institute Tech. Report UIUC-BI-AI-RCV-92-03. 

[105] J. Stewman and K.W. Bowyer. Creating the perspective projection aspect graph 
of polyhedral objects. In Proc. Int. Conf. Comp. Vision, pages 495-500, Tampa, 
FL, 1988. 

[106] S. Sullivan and J. Ponce. Automatic model construction, pose estimation, and 
object recognition from photographs using triangular splines. In Proc. Int. Conf. 
Comp. Vision, pages 510-516, 1998. 

[107] S. Sullivan and J. Ponce. Automatic model construction, pose estimation, and 
object recognition from photographs using triangular splines. IEEE Trans. Patt. 
Anal Mach. Intel!, 20(10), Oct. 1998. In press. 

[108] S. Sullivan, L. Sandford, and J. Ponce. Using geometric distance fits for 3D object 
modelling and recognition. IEEE Trans. Patt. Anal. Mach. Intel!, 16(12) :1183- 
1196, December 1994. 

[109] J.P. Thirion and G. Gourdon. The 3D marching lines algorithm: new results and 
proofs. Technical Report 1881-1, INRIA, 1993. 

[110] R.Y. Tsai. A versatile camera calibration technique for high-accuracy 3D ma- 
chine vision metrology using off-the-shelf TV cameras. Journal of Robotics and 
Automation, RA-3(4):323-344, 1987. 

[111] M. Turk and A.P. Pentland. Face recognition using eigenfaces. J. of Cognitive 
Neuroscience, 3(1), 1991. 

[112] S. Ullman and R. Basri. Recognition by linear combination of models. IEEE 
Trans. Patt. Ana! Mach. Intel!, 13(10):992-1006, 1991. 

[113] F. Ulupinar and R. Nevatia. Using symmetries for analysis of shape from contour. 
In Proc. Int. Conf. Comp. Vision, pages 414-426, Tampa, FL, December 1988. 

[114] C.E. Weatherburn. Differential geometry. Gambridge University Press, 1927. 

[115] 1. Weiss. Projective invariants of shapes. In Proc. IEEE Conf. Comp. Vision 
Patt. Recog., pages 291-297, Ann Arbor, Ml, 1988. 

[116] M. Wertheimer. Laws of organization in perceptual forms. Psycho! Forsch., 
4:301-350, 1923. English translation in: W.B. Ellis, A source book of Gestalt 
psychology pages 71-88, 1973. 

[117] A.P. Witkin. Scale-space filtering. In Proc. International Joint Conference on 
Artificial Intelligence, pages 1019-1022, Karlsruhe, Germany, 1983. 

[118] M. Zerroug and G. Medioni. The challenge of generic object recongnition. In 
M. Hebert, J. Ponce, T.E. Boult, and A. Gross, editors. Object Representation 
for Computer Vision, number 994 in Lecture Notes in Gomputer Science, pages 
217-232. Springer- Verlag, 1995. 

[119] M. Zerroug and R. Nevatia. Using invariance and quasi-invariance for the seg- 
mentation and recovery of curved objects. In J.L. Mundy, A. Zisserman, and 
D. Eorsyth, editors. Applications of Invariance in Computer Vision, volume 825 
of Lecture Notes in Computer Science, pages 317-340. Springer- Verlag, 1994. 

[120] A. Zisserman, D.A. Forsyth, J.L. Mundy, and G.A. Rothwell. Recognizing general 
curved objects efficiently. In J. Mundy and A. Zisserman, editors. Geometric 
Invariance in Computer Vision, pages 228-251. MIT Press, Gambridge, Mass., 
1992. 




Order Structure, Correspondence, and 
Shape Based Categories 



Stefan Carlsson 

um i 1 n lysis n omputin in oy 1 nstitut o nolo y ( ) 

-100 44 to olm w n 

stefanc@nada.kth.se , http : //www.nada. kth. se/ ~stef anc 



Abstract. p opos n 1 m t o o n in pointwis o 
spon n tw n 2- s p s s on t on pt o o st u tu 
n usin om ti sin. polmo nin o spon n n 

t polmo st lis in s p qui In n onsi s on 

nt smpolm. insp qui In w n in n In 

pointwis o spon n n t xist n o un m i nous o spon- 

n m ppin n us s ul o i in s p qui In. s 
m su os p qui In w will us t on pt o order structure 

w i in p in ipl n n o it y om t i on u tions 

su s points lin s n u s. o st u tu qui In o su - 
s ts o points n t n nt i tions o s p is will us to st lis 
pointwis o spon n . n in o o spon n tw n i 

nt i ws o t s m o j t n i nt inst nsot smojt 

t o y n us s oun tion o st lis m nt n o nition 

o isu 1 t o i s. 

1 Introduction 

The problem of computing visual correspondence is central in many applications 
in computer vision but it is notoriously difficult compared to the ease at which we 
solve it perceptually when looking at images. Given two images of the same ob- 
ject from different viewpoints or even two different instances of the same object 
category, humans are in general able to establish a point to point correspondence 
between the images. In the example such as fig 1 there is no “correct” geometric 
answer to the correspondence problem. This raises the question of what rules 
actually govern the establishment of correspondence. One alternative would be 
that the objects in the images are recognised as birds and point correspon- 
dence is established based on correspondence of part primitives such as head, 
throat etc. which in general have invariant relations for the category of birds. 
Another alternative however, would be to assume that correspondence is estab- 
lished based on some general similarity measure of image shapes, independent 
of the fact that the images are recognised as birds. In that case, correspondence 
can be used as a basis for recognition and categorisation instead of the other way 
around. The two alternatives can easily be seen to be connected to alternative 
theories of object recognition and categorisation that have been proposed over 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 58-71, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




O t u tu o spoil n n p s tois 59 




Fig. 1. By looking at pictures of instances of the same category of objects we 
are in general able to establish point to point correspondence 



the years: 1. Recognition by components (RBC), where objects are described as 
composed of generic parts that are recognised in the image (Marr and Nishihara 
1978, Biederman 1985) and 2. Recognition based on similarity of views (Blthoff 
et al. 1995, Tarr et al. 1995) which can form a basis for object categorisation 
(Edelman 1998). In this paper we will take the second point of view and define 
a general shape concept that can be used in the definition and computation of 
correspondence. 

Historically the problem of computing correspondence between deformed 
shapes has been formulated as finding the minimum of a cost function defined 
on the two shapes, (Burr 1981, Yoshida and Sakoe 1982). For a recent review see 
(Basri et. al 1998). This is most often limited to the case when the shapes are 
described by their outlines so that the matching is a problem of curve matching. 
The need to a priori extract the outlines of the shapes limits the applicability 
of these methods. The definition of the cost function and the complexity of the 
matching are other problems that have to be faced. 

The approach we will propose does not require any pre segmentation of shape 
outlines but works for arbitrary 2-D shapes. The input data to the algorithm 
will consist of groups of image coordinates with an associated tangent direction 
that can be obtained with extremely simple means. The input data structure has 
been simplified deliberately in order to avoid any complex perceptual grouping 
stage. 

2 Order Structure 

Order structure can be seen as the generalisation of the concept of ordering in 
one dimension to several dimensions. (Goodman and Pollack 1983, Bjorner et al. 
1993). Order properties can be defined for point sets and other algebraically de- 
fined sets of features in arbitrary dimension. In (Carlsson 1996) the concept was 




60 



t n Isson 



used for sets of line segments in an algorithm for partial view invariant recog- 
nition of simple categories. It is also closely related to the idea used to describe 
qualitative difference of views in the concept of the aspect graph (Koenderink 
and van Doom 1979). For computer vision problems, geometric structure can 
be described in a unified stratified way ranging from metric, affine and projec- 
tive to that of order and incidence structure (Carlsson 1997). This can also be 
seen as going from quantitative detailed descriptions to qualitative and general 
ones which is necessary if we want to describe equivalence of instances of object 
categories or different viewpoints of an object. 

Any structure concept defines equivalence classes of geometric shapes. Order 
structure seems to imply equivalence classes that are in accordance with those 
that are subjectively reported by humans which make it especially interesting 
to use for problems of visual recognition and categorisation. 



2.1 Capturing Perceptual Shape Equivalence 

If we look at the sequence of deformed shapes A-F in fig 2 we see that they are 
easily divided into two qualitatively different classes by the fact that the shape 
changes between C and D from a pure convex to a shape with a concavity. The 
concept of order structure can be used to capture the perceptual equivalence 
classes of shapes A-C and D-G respectively. If we sample the shapes D and F at 
five points and draw the tangents at those points, fig. 3, the qualitative structure 
of combined arrangement of points and lines can be seen to be in agreement in 
the sense that there is a one to one correspondence between the points of the 
two shapes such that the intersection of corresponding tangent lines are ordered 
relative to each other and to the points in exactly the same way. This relative 
ordering can be given a formal treatment using the concept of order structure 
for the arrangement of points and lines. The same order structure of points and 




A B CD E F G 



Fig. 2. Sequence of successive deformations 



tangents can be obtained by sampling any of shapes D-F at five points. Note 
that the sampling points are in general in perceptual correspondence between 
different shapes. It is not possible to obtain the same order structure by sampling 
shaped A, B or C however, due to the fact that they are convex and lacks the 




O t u tu o spoil n n p s tois 61 




Fig. 3. Equivalent order structure from points and tangent lines shapes D and 
G 

concavity of shapes D-F. The fact that subsets of points and tangent lines can 
be found with the same order structure is a potential tool for classifying shapes 
into equivalence classes and for establishing point to point correspondence. Note 
that there are other subsets of five points and tangents that have the same order 
structure through all of the shapes A - F. Fig. 4 illustrates that given perceptually 
corresponding points of two shapes we can get equivalent order structure based 
on points and tangent lines. 

2.2 Point Sets 

The order structure of a set of points and lines can be used to define equivalence 
classes of shapes whose members are qualitatively or topologically similar, i.e. 
they can be obtained from each other by a deformation that preserves order 
structure. Order structure is a natural extension of the concepts of affine and 
projective structure in the sense that all affine and projective transformations 
that preserve orientation also preserve order structure. Order structure preserv- 
ing transformations or deformations are much larger than the classes of affine 
and projective transformation however. For a set of points in that plane, order 
structure is denoted order type. The order structure defining property for point 
sets is that of orientation. Three points 1,2,3 have positive orientation if travers- 
ing them in order means anti-clockwise rotation. The orientation is negative if 
the rotation is clockwise. This can be established formally by computing the sign 
of the determinant: 



sign[pi p2 ps] 



( 1 ) 



62 



t n 



Isson 




Fig. 4. Equivalent order structure from points and tangent lines 



of the oriented homogeneous coordinates: 




For an arbitrary set of points, the order type is uniquely determined by the set 
of mappings: 



Xp{iij,k) = sign[pipjpk] {-1,1 (3) 

for all points i,j,k in the set. The various order types that exist for 3,4 and 
5 points are shown in fig. 5. Note that for five points we get unique canonical 
orderings of the points for the first two order types. For the third one, ordering 
is ambiguous up to a cyclic permutation. 

2.3 Sets of Lines 

Order structure for a set of lines in the plane can be defined in essentially the 
same way as for points by using the fact that they are projectively dual. Some 
care must be exercised however since order structure relies on oriented projec- 
tive geometry where duality does not hold strictly. If p are the homogeneous 
coordinates for a point, and q the homogeneous line coordinates of a line passing 
through p, we have 

T 

q p = 



0 



( 4 ) 




Fig. 5. Order types and canonical orderings for 3, 4, and 5 points. “ ” means 
that ordering is ambiguous up to cyclic permutations 



In order to define order structure for lines in the same way as for points we have 
to normalise the homogeneous line coordinates q in some way. This will be done 
by choosing: 




which means that we consider lines of the form: 

ax + y + b = 0 



(5) 

(6) 



The vertical line x + b = 0 therefore plays the same role as the point at infinity 
in the point case. All lines therefore become oriented relative to the vertical. The 
order type for a set of lines can now be represented by the signs: 

= sign[qiqjqk] {-1,1 (7) 



for all triplets of lines i,j, k in the set. 




64 



t n Isson 



2.4 Combination of Points and Lines 

By using oriented homogeneous coordinates for lines we introduce a direction 
for every line. We can therefore assign a left-right position for every point in the 
plane relative to every line. For line qi and point pj this is given by: 

Xpi = sign{qfpj) {-1,1 (8) 

For every arrangement of points and lines we can therefore talk about the order 
structure of combinations of points and lines represented by these signs. 



2.5 Order Structure Index from Points and Tangent Lines 

An arbitrary set of points and tangent lines can now be assigned a unique order 
structure based on point, line and point- line order structure given by the sign 
sets Xp(i, j, fc), Xi(i, j, fc) for all triplets of points in the set, and 

for all pairs. If we start by computing the order type for the set of points, we 
get a unique numbering from the canonical ordering. In the case where ordering 
is ambiguous up to cyclic permutations we choose the leftmost point as the first 
point. Given the numbering of the points, we can compute all determinant signs 




for triplets of oriented line coordinates and inner product signs for lines and 
points. These signs can be combined into indexes which characterize the order 
structure of the point-line arrangement. Note that points and lines plays different 




O t u tu o spoil n n p s tois 65 

roles in defining these indexes. The order structure of the points is used for the 

numbering of the points and lines. By giving an identity to each point and line, 
we increase the combinatorial richness of the arrangement of lines and thereby 
its discriminatory power compared to the case when the lines are not numbered. 

3 Combinatorial Geometric Hashing - Computing 
Correspondence 

Given subsets of points and lines from two shapes, we can compare their or- 
der structure. Order structure equivalence of subsets of points and tangents of 
an arbitrary shape can be used to establish point correspondence between the 
shapes. For two grey value images, we only have to sample points and tangents, 
i.e. there is no specific feature extraction or perceptual grouping stage neces- 
sary. The qualitative nature of the order structure concept means that metric 
accuracy of sampling point positions and tangent line directions is not required, 
leading to a robust scheme for computing correspondence. 

An algorithm for computing point correspondence between arbitrary shapes 
in grey level images has been implemented using geometric hashing(Lamdan et. 
al. 1988) based on the combinatorial order structure of subsets of points and 
tangent lines. The algorithm computes point correspondence of a certain shape 
and that of a “model-shape” stored in the form of tables that can be indexed. 
The first part of the algorithm is the construction of the model-shape index 
table, and the second part is the actual indexing. The steps of the algorithm 
are illustrated by fig. 7. Both the modeling and indexing stages are based on 
sampling edge points of a shape and selecting all five point combinations. For a 
certain five point set, the canonical order type and an order structure index is 
computed based on the point coordinates and the line coordinates of the tangent 
lines of the edge points. The order structure index is used to identify the specific 
point set in the model table. All five point sets with the same order structure 
index are stored in the same entry given by the index. 

Given a model table we can compute correspondence between the points in 
the table and the points that have been sampled from another shape. From 
the shape to be matched we sample five point combinations and compute order 
structure indexes just as in the modeling stage. The order structure index for a 
certain five point combination is used as a key to the model table and we note 
all the model five point combinations that are stored in the model table with 
that index. For each stored five point combination we vote for a correspondence 
to the five points from the shape to be matched. The end result is therefore a 
table of correspondence votes between the points of model shape and those of 
the shape to be matched. 

4 Results and Discussion 



The algorithm for computing correspondence has been tested on a set of shapes 
that were generated manually. Fig. 8 shows two shapes (A) and (B) of 12 points 




66 



t n Isson 



MODEL BUILDING MODEL INDEXING 




and tangent lines 

i 

Select all combinations 
of five points 

I 

Compute canonical point 
order from order type 

I 

Compute order structure 
index from 

- point set structure 

- line set structure 



- combined point and 
line set structure 




Store identities of the 
five points in table with 
index as look-up key 



Use index as entry to 
model table 



y 

Note the correspondence 
^ between all five point groups 
in model table 

\ 

Accumulate correspondence 
votes between points 



Fig. 7. Combinatorial geometric hashing for computing correspondence 



and their tangent lines. (The points are located at the midpoints of the tangent 
line segments) . The first table shows the normalised accumulated correspondence 
votes for the points when the shape (A) is matched to itself. We see that we get 
a dominant diagonal in the table as could be expected but we also get votes 
for incorrect correspondences but these are almost invariably associated with 
matchings to nearest neighboring points. The second table shows the result of 




O t u tu o spoil n n p s tois 6 



12 

^ /II 

2/ \10 

31 



4 \ 



5\ 



\8 

h 




V 



3 



/l1 



V 



A 





B 





A 

123456789 10 11 12 

1 66 14 8- -- -- -- -- 

2 14 46 28 --------- 

38 28 56 --------- 



4- --6B 32 9- -- -- - 

5- -- 32 39 26 9----- 

A6--- 9264032----- 

7- --- 9 32 65----- 

8- ----- - 67 29- 9 

9- ----- - 2 39 429 

10 ------ - 94 41 - 8 

11------- - 2-27 1 

12 ------ - 9981 40 



B 

123456789 10 11 12 

1 19 52------- - 11 

2 48 22 14 - -- -- -- 39 

3 17 53 55--------- 

4-- - 78 33 7------ 

5 - - - 35 51 35 12 - - - - - 

A 6--- 9 33 46 31----- 

7--- - 5 34 80 ----- 

8 - - - - - - - 62 12 19 2 19 

9---- - 1--13- 2- 

10 ------ - 66 42 16 

11- ------ - 4 - 27 8 

12- ----- - 7- 7- 6 



Fig. 8. Normalised accumulated correspondence votes for a shape (A) matched 
to itself and a deformed version (B) 




68 



t n Isson 



correspondence when the shape (A) is matched to a deformed version (B). The 
peaks in the correspondence table are all at perceptually “correct” correspon- 
dences with the exception of point 12. The tables in this figure as well as the 
following ones are normalised for readability. 

Fig. 9 shows a matching between pictures of a mug with a handle and a cup 
with a handle. From the table we see that most points of the mug are matched 
to perceptually corresponding points of the cup if peak in the table is used as 
matching criterion. The table has been partitioned into shape parts and there 
is a clear “parts” correspondence that can be read out from the table. A major 
ambiguity is that of the inner and outer parts of the mug’s handle that are 
confused with the corresponding inner and outer parts of the cup’s handle and 
that with the cup itself. 

Another example of matching instances of the same object category is shown 
in fig. 10 where the outlines of the pictures of a goose and a crow are matched 
to each other. The overall correspondence is perceptually plausible with the 
exception of the top of gooses back matching to that of the head of the crow. 

The results show that order structure of sampled points and tangent lines of 
a shape can be used to establish point correspondence between between shapes 
that are projections of instances of the same object category. The basic reason 
for this is that order structure is well preserved over shape deformations that 
do not alter the perceptual category of the shape. It is therefore an interesting 
alternative for use in recognition of shape based categories. Order structure can 
be considered as relational structure of very low level features. In that respect 
it is similar to shape category theories based on parts decomposition (Marr and 
Nishihara 1978, Biederman 1985) where object categories are defined based on 
relational structure of generic subparts. The fact that we use low level features 
however, means that we can bypass the stage of perceptual grouping that is 
necessary to define subparts which has proven to be very difficult to achieve in 
a robust way in computer vision. The correspondences that can be established 
between instances of the same object category using order structure indexing 
could be used to define a similarity measure which could be used as a basis for 
a system for categorical recognition (Rosch 1988, Edelman 1998) . 

The fact that we can bypass perceptual grouping is a major advantage with 
our approach since this is a major bottleneck in any recognition system. Since 
order structure is invariant w.r.t. small perturbations of feature locations we also 
gain in robustness since no exact feature localisation is necessary. The extrac- 
tion of features, i.e. points and tangent directions is essentially just a sampling 
process. 

The advantages of using indexing based on combinations of features in this 
way of course comes at the price of combinatorial complexity. By considering all 
combinations of five point features, the complexity of the algorithm will grow 
as where n is the number of features in the image. In practise this means 
that we cannot consider more than 25 — 30 features at a time. A major issue is 
therefore to find means how to reduce this complexity by being selective in the 
choice of feature combinations and only select a subset of all possibilities. 




O t u tu o spon n n p s 



o i s 69 




CUP 



1234567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 

1 41 41 ---16 14 ------ ---- ---- 

2 40 43-- 11 -18 ------ ---- ---- 

3 14 14 19 20--- ------ ---- ---- 

4 14 15 -20 17 -- ------ ---- ---- 

5 14 14 - 13 22 - - - -- -- - - -- - - -- - 

6---- - 30 24 - -- -- - - - 17 - - - 16- 

7 14 16 ---27 39 ------ --13- --12- 

8------- 96 53 21--- ---- ---- 

9 ------ - 69 55 38 16 - - - - - - - - - - 

10------- 49 51 45 28-- ---- ---- 

M 11 ------- 33 46 50 38-- ---- ---- 

U 12 ------- 19 38 48 47 13- ____ ____ 

G 13 ------- -19 32 40 16- ____ ____ 

14 _______ ___i2- - ____ ____ 

15 _______ ______ ____ ____ 

16 _______ - -- --12 ____ ____ 

17 _______ _____i2 ____ ____ 

18- ------ -12 19 26 13- ____ ____ 

19- ----- - - --20 19- -21- - -21-- 

20 ------ - - -- 13 20 14 - 26 - - - 25 -- 

21 - -- --16 14 ______ __43_ __40- 

22- ----- - ______ ___i5 ___i5 

23- ------ -- 12 19 15- ____ ____ 

24- ------ ---14 26 12 -24-- -24-- 

25 ----- 15 - ------ - - 38 - --43- 

26 ----- 17 18 ------ - - 34 - - - 36 - 

27------ - - -- -- - - -- - - -- - 



Fig. 9. Normalised accumulated correspondence votes for MUG and CUP 




0 



t n Isson 



I' 3' 



1*5' 



^ 5 ' 



/24' >2' 









: 52 ' 

> 21 ' 

> 20 ' 

XL9' 

>18' 



/4' 



>17' 



>€' 



>16' 



/ > 12 ' 
>13' 



> 10 ' 



CROW 

1 3 4 5 6 7 8 9 13 16 17 18 19 20 21 22 23 24 25 26 27 28 

118--------------------- 

2 19 - -- -- -- -- -- -- -- -- -- -- 

4 51 - -- -- -- -- -- -- -- -- -- -- 



5- 41-51------------------ 

6- 2049732322---------------- 

7- 19 15 70 44 42-- ------------- - 

8- --18434048-15------------- 

G 9---22434039-17------------- 

010- ---2222473738------------- 

011- -----413636------------- 

S12------446368------------- 

E 13 - -- -- -- 16 33 ------------- 

14 --------- 45 20 ----13 ------ 

15 - - - - - - - - - - - 32 25 31 31 - 23 21 14 - - 

16 - - - - - - - - - - 12 26 23 20 21 - 17 - - - - 

17 - - - - - - - - - - - 11 13 12 - 18 12 - 11 30 15 13 

18 --------------- 24 ---26 39 19 

19 ---------------- 19 11---- 

21 - -- -- -- -- -- -- -- -- 21 ---- 

22 - - - - - - - - - - - - - 13 14 15 - - 30 37 21 19 

23 --------------- 15 ---28 37 19 



Fig. 10. Normalised accumulated correspondence votes for GOOSE and CROW 




O t u tu o spoil n n p s tois 1 

In a sense, this is exactly what classical perceptual grouping tries to achieve 
by using rules of non-accidentalness of image features, (Lowe 1985). Order struc- 
ture indexing implies a more general approach to this problem since we do not 
limit ourselves a priori to specific feature relations but can use arbitrary combi- 
nations on the basis of their effectiveness in establishing correspondence. 



References 



1 . s i L. ost . i n . o s t minin t simil ity o 

o m o j ts ision s 38 ssu 15-16 pp. 2365-2385 u ust (1998) 

2 . i m n um n im un st n in nt s n toy 

32 29- 3 (1985) 

3 jo n L s n s tu m Is it n i 1 O i nt M t oi s n y lo- 

p i o M t m ti s n its ppli tions ol. 46 . . ot ito m i 

ni sity ss (1993) 

4 . . u 1 sti M t in o Lin win s ns. on tt n n lysis 

n M in nt Hi n 3 o. 6 pp. 08- 13 o m (1981) 

5 . iilt o . Im n n M. ow t - im nsion 1 o j ts p- 

s nt in t in 1 o t x 5 24 -260 (1995). 

6 . Isson om in to i 1 om t y o p p s nt tion n n xin O - 
j t p s nt tion in omput ision p in L tu ot s in omput 

i n 1144 pp. 53-8 on iss m n s. (1996) 

Isson om ti tutu n iwn int o nition ilosop i 1 

ns tions o t oy 1 o i ty o Lon on i s 356 1233 - 124 (1998) 

8 . 1 m n p s nt tion is p s nt tion o simil ity io 1 n in 

i n s (to pp ) (1998) 

9 . . oo m n n .oil Multi im nsion 1 so tin M . omput. 12 



484 50 (1983) 



10 ..on in n . on oo n int n 1 p s nt tion o soli s p 

wit sp t to ision iolo i 1 y n ti s 32 211 - 216 (19 9) 

11 . L m n . . w tz n . . ol son O j t o nition y n in- 

i nt m t in . n o . -88 pp. 335-344. (1988) 

12 . . Low ptu 1 O nis tion n isu 1 o nition luw (1984). 

13 . M n . is i p s nt tion n o nition o t sp ti 1 o ni- 

s tion o t im nsion 1 s p s o . oy. o . 200 269 - 294 (19 8) 

14 .os in ipl s o to is tion n os n Lloy s. o nition n 

t o is tion pp. 2-48 1 um ills 1 (1988) 

15 M. . . . yw . ut i n . illi mns s o j t o nition 

m i t y i wpoint in i nt p ts o i wpoint p n nt tu s p- 

tion 24 4 (1995) 



16 . os i n . o Onlin n w itt n 

son 1 omput syst m 
209 (1982) 

1982 



t o nition o p - 
1 t oni s -28 (3) 202 - 



ns. on onsum 




Quasi-Invariant Parameterisations 
and Their Applications in Computer Vision* 



un Sato^’^ an o to ipo a? 

^ N goy n titut o hnology N goy p n 

p tm nt o ngin ing niv ity o mig mig 21 K 



Abstract, n thi p p w how th t th xi t quasi-invariant pa- 
rameterisations whi h not x tly inv i nt ut pp oxim t ly inv i- 
nt un g oup t n o m tion n o not ui high o iv - 

tiv h n u i-inv i nt p m t i tion i inv tig t in mo 

t il n xploit o fining g n 1 n mi-lo 1 inv i nt om 

on o iv tiv only h n w inv i nt impl m nt n 

u o m t hing u v gm nt un g n 1 n motion n x- 
t ting ymm t y x o o j t with 3 il t 1 ymm t y 

1 Introduction 

ativ motion tw n an o s v an t s n aus s isto tions in im- 
ag s. s isto tions an si y sp i transformation groups 1 
su as u i an a n an p oj tiv t ans o mations. us g om t i inva i- 

ants un t s t ans o mation g oups a v y impo tant o o j t ognition 

an i nti ation 1 1 20 22 2 . 

t oug inva iants on points av n stu i xt nsiv y 1 22 32 in- 
va iants on smoot u v s an u v su a s av not n xp o noug 

23 2 30 . t a itiona inva iants o smoot u v s a i ntia inva i- 

ants 30 . Sin t s inva iants qui ig o ivativ s many m t o s 

av n stu i to u t o o ivativ s o ning inva iants on 
u V s. 

w av noug num o istinguis points on u v s t s points 

p ovi 00 inat am s to no ma is isto t u v s wit out using iva- 
tiv s 32 . op ana u v s t istinguis points a qui o no - 
ma ising a n isto tions an ou points o p oj tiv isto tions. w us 
t i ntia p op ty o u v s w an u t num o istinguis 
points qui o omputing inva iants on u v s 1 20 2 . o xamp i 

w av st ivativ s two istinguis points a noug to omput inva i- 

ants un p oj tiv t ans o mations 2 . ow v to n o spon n so 

istinguis points on t o igina an isto t u v s is non-t ivia p o m. 

o op wit t s p o ms s mi- o a inva iants w a so p opos 23 . 
y s ow t at it is possi to n inva iants s mi- o a y y w i t 
o o ivativ s in inva iants an u om t at o g oup u vatu s 

* h utho knowl g th uppo t o th g nt /K 202 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 72-92, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




u i- nv i nt 



m t i tion n h i ppli tion 



3 



to t at o g oup a - ngt wit out using any istinguis points on u v s. 

s w av s n in t s wo ks t inva iant pa am t isation is impo tant to 
gua ant uniqu i nti ation o o spon ing int va s on u v s. 

t oug s mi- o a inva iants u t o o ivativ s qui it 
is known t at t o is sti ig in t g n a a n an p oj tiv as s 

(s ta 1). nt is pap w int o u a quasi-invariant parameterisation an 
s ow ow it na s us to us s on o ivativ s inst a o ou t an 

t . i a o quasi-inva iant pa am t isation is to app oximat t g oup 
inva iant a - ngt y ow o ivativ s. n w pa am t isations a 

t o ss s nsitiv to nois an a app oximat y inva iant un a s ig t y 
st i t ang o imag isto tions. 

on pt o quasi-inva iants was o igina y p opos y in o 2 w o 

s ow t at quasi-inva iants na a u tion in t num o o spon ing 

points qui o omputing a g ai inva iants. o xamp quasi-inva iants 
qui on y ou points o omputing p ana p oj tiv inva iants 2 w i 

xa t p ana p oj tiv inva iants qui v points 1 . t as a so ns own 

t at quasi-inva iants xist v n un t situation w t xa t inva iant 
o s not xist 2 . n spit o its pot ntia t quasi-inva iant as not p vious y 

n stu i in tai . n ason o t is is t at t on pt o quasiness is 

at am iguous an is i u t to o ma is . u t mo t xisting m t o 

is imit to t quasi-inva iants as on point o spon n s 2 o t 
quasi-inva iants un sp i mo s 31 . 

n t is pap w inv stigat quasi-inva ian on smoot mani o s an s ow 
t at t xists a quasi- invariant parameterisation t at is a pa am t isation 
app oximat y inva iant un g oup t ans o mations. t oug t app oxi- 
mat va u s a no ong xa t inva iants t i ang s a n g igi o a 
st i t ang o t ans o mations. n t aim is to n in pa am - 
t isations t sttao twnt o aus y t app oximation an 
t o aus y imag nois . 

o owing t motivation w inv stigat a m asu o inva ian w i 

s i s t i n om t xa t inva iant un g oup t ans o mations. o 
o ma is a m asu o inva ian in i ntia o mu a w int o u t so 
a prolongation 1 o v to s. n xt n a quasi-inva iant pa- 
am t as a un tion w i minimis s t i n om t xa t inva iant. 

quasi inva iant pa am t un gnaant ans o mations is t n p o- 

pos . p opos pa am t is app i to s mi- o a int g a inva iants an 

xp oit su ss u y o mat ing uvsun gnaant ans o mations 
in a imag s qu n s. 

2 Semi-local Invariants 

t inva iants a too o a su as i ntia inva iants 30 t y su 
om nois . t inva iants a too g o a su as mom nt (int g a ) inva i- 
ants 111 2 t y su om o usion an t qui m nt o o spon- 

n s. t as ns own nt y 23 t at it is possi to n int g a 




un to n o to ipoll 




Fig. 1. nti ying int va o int g ation s mi o a y. (a) an ( ) a imag s 
o a apan s a a t xt a t om t st an t s on vi wpoints. 

int va o int g ation in t s two imag s an i nti uniqu y om 

inva iant a - ngt . o xamp an int va(— i,+ i) o spon s 

to an int va ( — i, + i)- 



inva iants s mi- oaysotatt y o not su om o usion imag nois 
an t qui m nt o o spon n s. 

onsi a u V C to pa am t is y . t is a so possi to 

pa am t is t u v y inva iant pa am t s un sp i t ans o mation 

g oups. s a a a - ngt o t g oup. impo tant p op ty o 
g oup a - ngt is t at it na s us to i nti y t o spon ing int va o 
u V s automati a y. onsi a point C()onauvCto t ans o m to 
a point C() on a uv C yag oup t ans o mation as s own in ig. 1. Sin 

is an inva iant pa am t isation i w tak an int va ( — , ) on 

C an an int va ( — , + ) on C t n t s two int va s o spon 

to a ot (s ig. 1). at is y int g ating wit sp t to t g oup 

a - ngt t o spon ing int va o int g ation o t o igina an t 
t ans o m u v s an uniqu y i nti 

y using t inva iant pa am t isations w an ns mi- o a inva iants 

at point C( ) wit int va (— , ) as o ows 

pt-\-Aw 

() / ( 1 ) 

J t — Aw 



w is any inva iant un tion un t g oup. oi o p ovi s 

va ious kin s o s mi- o a inva iants 23 . w oos t un tion a u y 
t int g a o mu a (1) an so v ana yti a y an t su ting inva iants 
av simp o ms. o xamp in t an as w av t o owing s mi- 
o a inva iants 





u i- nv i nt 



m t i tion n h i ppli tion 



5 




Fig. 2. S mi- o a inva iants. nan as s mi- o a inva iants a n as 

t atio o two a as n y t inva iant pa am t . 

( ) ^ ( 2 ) 

2l ) 

1 an 2 a t a as ma o t two s ts o two v to s C( -|- i) — C( ) 

an C( — i) — C( ) an C( -1- 2) — C( ) an C( — 2) — C( ) as o ows 

lO ^C(+ i)-c(),c(- i)-c() 

2O ic(+ 2)-C(),C(- 2)-C() 

w xi , X2 not s t t minant o a mat ix w i onsists o two o umn 

V to S Xi, X 2 R^. 

om ta 1 19 it is a t at t s mi- o a inva iants a us u un 
u i an an sp ia a n as s ut t y sti qui ig o ivativ s 

in g n a a n an p oj tiv as s. 

isto tion aus y a g oup t ans o mation is o t n not so a g . o 
xamp t isto tion aus y t ativ motion tw n t o s v an 
t s n is st i t aus o t nit sp o t am a o o j t mo- 
tions. n su as s pa am t s app oximat y ow o ivativ s giv us 

Table 1. o ivativ s qui o t g oup a - ngt an u vatu . 

n g n a ivativ s mo t an t s on o as nsitiv to nois an a 

not avai a om imag s. us t g n a a n an p oj tiv a - ngt as 

w as u vatu s a not p a ti a . 



group 


-1 ngth 


U V tu 


u li n 


1 t 


2n 


p i 1 n 


2n 


th 


g n 1 n 


th 


5th 


p oj tiv 


5th 


th 






un to n o to ipoll 



a goo app oximation o t xa t inva iant pa am t isation. a su a pa- 
am t isation a quasi-invariant parameterisation. n t o owing s tions w 
n t quasi-inva iant pa am t isation an iv an a n quasi-inva iant 
pa am t isation. 



3 Infinitesimal Quasi-Invariance 

o iving quasi-inva iant pa am t isations w st onsi t on- 
pt o infinitesimal quasi-invariance t at is quasi-inva ian un in nit sima 

g oup t ans o mations. 



3.1 Vector Fields of the Group 

L t a Lie group t at is a g oup wi aist stutu oa smoot 

mani o in su a way t at ot t g oup op ation (mu tip i ation) an t 
inv sion a smoot maps 1 . ans o mation g oups su as otation u- 
i an a n an p oj tiv g oups a Li g oups. 

onsi an imag point x , to t ans o m to x ~ y a 

g oup t ans o mation so t at a un tion ( , ) wit sp t to an 

00 inat s is t ans o m to y . 

n nit sima y t is is onsi as an a tion o a 2 v to v 



V 



'rr 



( 3 ) 



w Fan r/ a un tions oFan FLoayt o itot point x aus 
y t t ans o mation F is si y an int g a u v F o t v to 
V passing t oug t point (s ig. 3). uniqu n ss o an o ina y 

i ntia quation gua ant s t xist n o su a uniqu int g a u v in 
t V to 

aus o its in a ity in nit sima g n ato an si y t sum- 
mation o a nit num o in p n nt v to s Vi {i 1,2, ,F) o 

t g oup as o ows 

m 

V Ev. ( ) 

2=1 



Vi is t tt in p n nt V to 



F F 
^ FF 



( ) 



w Li an rji a asis o i nts o ^ an ^ sp tiv y an a un tions 
oFanF sinpnntvto soma nit im nsiona v to 
spa a a Lie algebra 1 . Lo a y any t ans o mation o t g oup an 
si y an int g a o a nit num o in p n nt v to s Vi. 

V to s i in (3) a ts as a i ntia op ato o t Li ivativ . 




u i- nv i nt 



m t i tion n h i ppli tion 



\ \ t / / / / X 




Fig. 3. V to V an an int g a 

to C y a g oup t ans o mation so t at t 
to P. Lo a y t o it o t point aus 
wit t int g a u V P o t v to 



u V r. u V C is t ans o m 
point P on t u v is t ans o m 
y a g oup t ans o mation oin i s 
at t point P. 



3.2 Exact Invariance 

L t V an in nit sima g n ato o t g oup t ans o mation. a -va u 
un tion Pis inva iant un g oup t ans o mations i an on y i t Li iva- 

tiv o r wit sp t to any in nit sima g n ato v o t g oup F vanis s 
as o ows 1 



Pv P 0 ( ) 



w 


not s t 


Li 


ivativ s wit 


sp t to a V to 


V. Sin F 


is a s a a 


un tion t 


Li 


ivativ is t 


sam as t i tiona 


ivativ 


wit sp 


t to V. 


us t 


on ition o 


inva ian ( ) an 


w itt n as 


0 ows 


















V r 0 




( ) 


W V 


is t i 


tiona 


ivativ wit 


sp t to V. 





3.3 Infinitesimal Qnasi-Invariance 

i a o quasi-inva ian is to app oximat t xa t inva iant y a tain 

un tion F{r, I) w i is not xa t y inva iant ut n a y inva iant. t un - 

tion r is not xa t y inva iant t quation ( ) no ong os. an ow v 
m asu t i n om t xa t inva iant y using ( ). y nition t 
ang in un tion F aus y t in nit sima g oup t ans o mation in u 
y a V to V is si y t Li ivativ o P as o ows 





un to n o to ipoll 



rr V r r ( ) 

i—1 

o m asu ing t inva ian o a un tion i sp tiv o t oi o asis 
V to s w onsi an int insi v to o t g oup F. t is known 2 

t at i t g oup is s mi-simp ( .g. otation g oup sp ia in a g oup) t 



xists a non- 


g n at symm t i i in a 


0 m a 


Killing form F 


0 t 


Li 


a g a as 


0 ows 














r(v„vj) n{Fi{^,)Fi{w-_ 


)) thT 


1,2, ,r) 


(9) 


w 


rnyi) 


not s t a joint p s ntation o 


Vi 


an IF not s t 


t a 


( 


a joint 


p s ntation FTl'Vi) p ovi 


s a r 


X 


F mat ix p s 


ntation 


0 t 


ag a 


w os {F, F) ompon nt is 


s i 




y a st u tu 


onstant 


r-’ 

^ ik 


2 )• 


Ki ing 0 m p ovi s t m 


t i t nso 


Fij 0 t ag 


a 






Fij F{vi, 








(10) 



an t asimi op ato Fa n y t m t i t nso is in p n nt o t 
oi o t asis v to s 



w F^ is t inv s o Fij. at is t m t i F^ ang s a o ing to t 
oi o asis v to s v* so t at Fa is an inva iant. Sin F^ is symm t i t 
xists a oi o asis v to s Vi{i 1,2, ,F) y w i F^ is iagona is 

as o ows 




F 

F 



Su V to s Vi{i 1,2, ,F) a uniqu in t g oup F an t us 

int insi . y using t int insi v to s in ( ) w an m asu t ang 

in va u o a un tion F w i is int insi to t g oup F. 

o m asu ing t quasi inva ian o a un tion i sp tiv o t magnitu 
o t un tion w onsi t ang in un tion FF no ma is y t o igina 
un tion F. t us n a m asu o in nit sima quasi inva ian Q o a 
un tion F y t squa sum o no ma is ang s in un tion aus y t 

int insi v to s Vi(i 1,2, ,F) as o ows 




is is a m asu o ow inva iant t un tion is un t g oup t ans- 

o mation. Q is sma noug w a Fa quasi-inva iant un in nit sima 

g oup t ans o mations. 

n o tunat y i t g oup is not s mi-simp ( .g. g n a a n g oup g n- 
a in a g oup) t Ki ing o m is g n at an w o not av su int in- 
si v to s. ow V it is known t at a non-s mi-simp g oup is om- 

pos into a s mi-simp g oup an a a i a . us in su as s w oos a 

stovto swi o spon to t s mi-simp g oup an t a i a . 




u i- nv i nt 



m t i tion n h i ppli tion 



9 



4 Quasi-Invariance on Smooth Manifolds 

n t ast s tion w int o u t on pt o in nit sima quasi-inva ian 

w i is t quasi-inva ian un in nit sima g oup t ans o mations an 

iv am asu o t inva ian o an app oximat un tion. n o tunat y 

(11) is va i on y o un tions w i o not in u ivativ s. n t is s tion 
w int o u an impo tant on pt known as t prolongation 1 o v to 

s an inv stigat quasi-inva ian on smoot mani o s so t at it na s 
us to n quasi-inva iants wit a i ntia o mu a. 

4.1 Prolongation of Vector Fields 

p o ongation is a m t o o inv stigating t i ntia wo om a g o- 
m t i point o vi w. L t a smoot u v C si y an in p n nt 

va ia F an a p n nt va ia F wit a smoot un tion F as o ows 

F F{F) 

u V C is t ans o m to C y a g oup t ans o mation F in u y a 
V to V as s own in ig. . onsi a It o p o ong spa w os 

00 inat s a F F an ivativ s o F wit sp t to F up to /t o so 

t at t p o ong spa is F -|- 2 im nsiona . u v s C an C in 2 

spa a p o ong an s i y spa u v s an in t F+2 

im nsiona p o ong spa . p o ong v to is a v to in 

F+2 im nsion w i a i st p o ong u v tot p o ong u v 

g(fe) 

xp i it y as s own in ig. . o p is y t Ft o p o ongation 

o a v to V is n so t at it t ans o ms t Ft o ivativ s 

o a un tion F F{F) into t o spon ing Ft o ivativ s 

o t t ans o m un tion F F{F) g om t i a y. 

L t Vi, {i 1,H^F) r in p n nt V to s in u y a g oup 
t ans o mation F. Sin t p o ongation is in a t R p o ongation 

(k) 

oagnavto van si ya sum o /t p o ongations v) 

o t in p n nt V to s as o ows 

m 

v<‘> ^v<‘> 

i=l 

onsi a V to ( ) in 2 spa again, ts st an s on p o ongations 
v(i) a omput as o ows 1 

Vi + {F^{r]i - FFa;) + FiFa;x)j^ (12) 

J-X 

V' ^ '^i F{F^{rji — FiFx) + FiFxxx)Y+ (1^) 

4 XX 

w F X an F'^ not t st an t s on tota ivativ s wit sp t 

to F an Fx Fxx Fxxx not t st s on an t t i ivativ s o F 




0 



un to n o to ipoll 



original image transformed image 




X 



prolonged space prolonged space 

Fig. 4. o ongation oavto . Ho po ong v to 
t ans o ms H o ivativ s o F into It o ivativ s o F. at is 

t p o ong u V is t ans o m into t p o ong u v y t 

p o ong V to . is na s us to inv stigat ivativ so un tions 

g om t i a y. not s H o p o ongation. is gu i ust at s t 

st o p o ongation [F 1). 



wit sp t to n L t F(r, rj 1^^^) a un tion o F Fan ivativ s o F wit 
sp t to F up to rt o w i is not y . Sin t p o ongation 
si s ow t ivativ s a going to ang un g oup t ans o mations 

w an omput t ang in un tion FF aus y t g oup t ans o mation 
F as o ows 

FT F 

w is t Ft o p o ongation o t in nit sima g n ato v o 

a t ans o mation F Not t at w qui on y t sam o o p o ongation 
as t at o t un tion F. Sin t p o ongation s i s ow ivativ s 




u i- nv i nt 



m t i tion n h i ppli tion 



1 



a going to ang it is impo tant o va uating t quasi-inva ian o a 
i ntia o mu a as si in t n xt s tion. 



4.2 Quasi-Invariance on Smooth Manifolds 

L t us onsi t u V C in 2 spa again. Suppos is a un tion 

on t u V ontaining t ivativ s o F wit sp t to P up to t Ft 

o w i w not y . Sin t It o p o ongation o t 

V to V t ans o ms It o ivativ s I^”^ o t o igina u v to 

It o ivativ s 2^"^ o t t ans o m u v t ang in un tion 

nj(i^")) aus y t in nit sima g oup t ans o mation in u y t it 
in p n nt V to is si y 






(1 ) 



quasi-inva iant is a un tion w os va iation aus 
is ativ y sma ompa wit its o igina va u . 
inva ian Q on smoot u v C y t no ma is 
int g at a ong t u v C as o ows 



y g oup t ans o mations 
t us n a m asu o 

squa sum o H7(it")) 



Q 




(1 ) 



1(1^")) is os tot 
su o ow inva iant t 



xa t inva iant t n Q t n s to z o. us Q is a m a- 

un tion 1(1^"^) is un t g oup t ans o mation. 



5 Quasi-Invariant Parameterisation 



n t ast s tion w av iv quasi-inva ian on smoot mani o s. 
now app y t su ts an inv stigat t quasi-inva ian o pa am t isation 
un g oup t ans o mations. 







g oup 


a - ngt 


F 


0 a u V C is in g 


n a si y a g oup 


m 


t i 


F an 


t in 


p n 


nt va ia Tot 


u V as 0 ows 












IT ITT 




w 




FF an IT a 


t 


i ntia s o T an 


T sp tiv y. Suppos t 


m 


t i 


r is 


s i 


y t 


ivativ s 0 T wit 


sp t to T up to It 0 


as 


0 


ows 






T 




w 




jik) 


not s t 


It 0 


p 0 ongation o T 


ang 0 t i ntia 


fff. 


aus 


y t it 


in p n nt V to 


is t us iv y omputing 


t 


Li 


ivativ 0 IT wit 


sp t to t It 0 


p 0 ongation o 



FFFi vf ^ FFF (vf ^ F + I^)FF (1 ) 




2 



un to n o to ipoll 



ang in IT no ma is y IT its is si as o ows 

ITT, 



TTTi 



TT 



r + — 

T ^ ^ yT 



(1 ) 



m asu o inva ian o t pa am t T is t us si y int g ating 

t squa sum o ll Ti a ong t u v C as o ows 



Q / CTT 

Jc 



(1 ) 



^ E(^vf ) r + (19) 

i=l 

t pa am t is os to t xa t inva iant pa am t Q t n s to z o. 

t oug t is no xa t inva iant pa am t un ss it as noug o so 
ivativ s t sti xists a pa am t w i minimis s Q an qui s on y 
ow o ivativ s. a su a pa am t a quasi-inva iant pa am t 

o t g oup. n ssa y on ition o Q to av a minimum is t at its st 

va iation IQ vanis s 



TQa 0 ( 20 ) 

is is a va iationa p o m o on in p n nt va ia T an two p n nt 

va ia s an T. t is known t at (20) o s i an on y i its u -Lag ang 

vanis s as o ows 1 

£ C 0 ( 21 ) 

w £ not s t u op ato . 

n t n xt s tion w onsi an as an iv a m t i w i 

minimis sQun gnaant ans o mations. 

6 AfRne Quasi-Invariant Parameterisation 

n t is s tion w app y quasi-inva ian to iv a quasi-inva iant pa am - 

t isation un gnaant ans o mations w i qui s on y s on o 
ivativ s an is t us ss s nsitiv to nois t an t xa t inva iant pa am t 
w i qui s ou t o ivativ s. 

Suppos t quasi-inva iant pa am t isation Pun gnaant ans- 
o mation is o s on o so t at t m t i Tot pa am t P is ma 
o ivativ s up to t s on 

TT I{T,T^)TT (22) 

w Tx an Txx at st an t s on ivativ so T wit sp t to P 
o n a quasi-inva iant pa am t is t us t sam as n ing a s on o 




u i- nv i nt m t i tion n h i ppli tion 3 




(c1) prolonged defi vectorfield (c2) top view (d1) prolonged def2 vector field (d2)topview 



Fig. 5. oongationo a n v to s. (a) ( ) ( ) an ( ) s ow t o igina 
an t p o ong a n v to s (i. . iv g n u an two o mation 
ompon nts). 

i ntia un tion w i minimis s t quasi-inva ian Q un 

g n a a n t ans o mations. Sin t mti F is os on o w qui 
t s on o p o ongation o t v to s to omput t quasi-inva ian 
o t mti. 

6.1 Prolongation of Affine Vector Fields 

two im nsiona g n a a n t ans o mation is si y a 2 x 2 inv ti 
mat ix A L(2) an at ans ationa ompon nt t an t ans o ms 

X into X R^ as o ows 

X Ax + 1 



Sin t i ntia o m FF in (22) o s not in u F an F ompon nts it 
is inva iant un t ans ations. us w simp y onsi t a tion o A 



L(2) w i an 


s i 


y ou in 


p n nt V to s Vi{i 


1, , ) 


t at is t iv g n 


u 


an t 


two 


ompon 


nts 0 0 mation 


13 






^F 






^F 




Vl 


r— 


+ r— 




V2 


-F— + F— 




rr 


FF 




FF FF 




Vs 




-F-^ 




V4 


F F 

r-T^ + r— 


(23) 


rr 


FF 




FF FF 


Sin t g n a 


in a 


g oup 


L(2) 


is not 


s mi-simp t Ki 


ing 0 m 


(9) is g n at an 


t 


is no 


uniqu 


oi 


0 V to sot 


g oup 


(s s tion 3.3). t is 


ow 


V 


ompos 


into t a i a w i o 


spon s 



to t iv g n an t sp ia in a g oup SL(2) w i is s mi-simp an 




un to n o to ipoll 



w os int insi v to s oin i wit V2 V3 an V4 in (23). us w us 
t V to s in (23) o omputing t quasi-inva ian o i ntia o ms 

un g n a a n t ans o mations. 

om (12) (13) an (23) t s on p o ongations o t s v to s a 

omput y 



.(2) 

1 


F 

^xx I 1 

^ ^ XX 






.(2) 

2 


V2 + (1 + 1^) + SFcFxx „ 

^ ^ ^ XX 




.(2) 

3 


V 3 2/j; HFxx „„ 

^ ^X ^ ^ XX 






.(2) 

4 


V 4 + {l-I^)-^;p iFxFxx 


F 

PFxx 


(2 ) 



sat V to s in on im nsion w os 00 inat s a F F an 

Fxx an t p oj tion o t s v to s onto t F— T p an oin i s wit 
t o igina a n v to s (23) in two im nsion. o a s n t 
st p o ongations oanvto sas own in ig. . no tunat y t 

s on p o ongations annot s own in gu s sin t y a ou im nsiona 

V to s. ow V t st p o ongations at p oj tion os on p o- 
ongations an t us ig. may p a s to in t st u tu o s on 

p o ongations. 

Sin FFF is o s on o t p o ong v to s an 

si ow t pa am t T is going to ang un g n a a n 

t ans o mations. 

6.2 Affine Quasi-Invariant Parameterisation 

y su stitutmg t a n p o ong v to s v) V2 ' an V4 into 
(19) an so ving (21) w n t at IQ vanis s o any u v F F{F) i t 

o owing un tion P is os n 2 

F FxxH^ + lty^ (2) 

on u t at o any u v t o owing pa am t F is quasi-inva iant 
un g n a a n t ans o mations 

FF FxxH^ + iiy^rr (2) 

y o ma ising (2 ) w n t at t pa am t P is s i y t u- 

i an a - ngt FF an t u i an u vatu F as o ows 

FF FilF (2 ) 

us FF is in at an xa t inva iant un otation an quasi-inva iant un 
iv g n an o mation. Not it is known t at t inva iant pa am t un- 
simi a ity t ans o mations is FFF an t at o sp ia a n t ans o mations is 




u i- nv i nt 



m t i tion n h i ppli tion 



5 





(a) Integral invariants (std 0.1) (b) Differential invariants (std 0.1) 





(c) Integral invariants (std 0.5) (d) Differential invariants (std 0.5) 



Fig. 6. su ts o nois s nsitivity ana ysis. inva iant signatu s o an a ti- 
ia u V a iv om t p opos inva iants (s mi- o a inva iants as 
on a n quasi inva iant pa am t isation) an t an i ntia inva iants 
(a n u vatu ) an a s own y t i k in s in (a) an ( ) sp tiv y. 

ots in (a) an ( ) s ow signatu sat a ing an om aussian nois o st 
0.1 pix s an t ots in ( ) an ( ) s ow signatu sat a ing an om 
aussian nois o st 0. pix s. t in in s s ow t un tainty oun s o 
t signatu s stimat y t in a p tu ation m t o . signatu s om 

t p opos m t o a mu mo sta t an t os o i ntia inva iants. 



IT iv quasi-inva iant pa am t Fognaant ans o mations 

is tw n t s two as xp t . a Ft affine quasi-invariant parameter 
(arc-length). Sin t n w pa am t qui sonyt s on o ivativ s 

it is xp t to ss s nsitiv to nois t an t xa t inva iant pa am t 

un g n a a n t ans o mations. 

y using F o F in (2) w an nan quasi s mi- o a inva iants. 

s inva iants qui on y s on o ivativ s an o not qui any 

istinguis points on u v s. 



7 Experiments 

7.1 Noise Sensitivity of Quasi Invariants 

st ompa t nois s nsitivity o t s mi- o a inva iants as on t 
p opos a n quasi-inva iant pa am t isation wit t at o a n i ntia 
inva iants i. . a n u vatu . 





un to n o to ipoll 



inva iant signatu s o an a ti ia u v av n omput om t 
p opos quasi-inva iants an t a n u vatu an a s own y so i in s 

in ig. (a) an ( ). ots in (a) an ( ) s ow t inva iant signatu sat 

a ing an om aussian nois o stan a viation o 0.1 pix s to t position 
ata o t u V an t ots in ( ) an ( ) s ow t os o stan a viation o 

0. pix s. s w an s in t s signatu s t p opos inva iants a mu 

ss s nsitiv to nois t an t i ntia inva iants. is is simp y aus 

t p opos inva iants qui on y s on o ivativ s w i i ntia 

inva iants qui ou t o ivativ s. t in in s s ow t su ts o 

nois s nsitivity ana ysis iv y t in a p tu ation m t o . 

7.2 Curve Matching Experiments 

N xt w s ow p imina y su ts o u v mat ing xp im nts un ativ 

motion tw n an o s v an o j ts. 

ig. (a) an ( ) s ow t imag s o natu a av s tak n om two i 

nt vi wpoints. w it in s in t s imag s s ow xamp ontou u v s 

xt a t om -sp in tting . sw ans int s uvs aus o t 

vi w motion t u v s a isto t an o u pa tia y. Sin t a is 

nay flat an t xt nt o t a is mu ss t an t istan om t 
am a to t aw an assum t at t o spon ing u v s a at y 
a g n a a n t ans o mation. 

omput inva iant signatu sot o igina an t isto t uvs 

a s own in ig. ( ). n o t s two signatu s was sit o izonta y 
minimising t tota i n tw n t s two signatu s. o spon ing 

points on t ontou uvsw xtat y taking i nti a points in t s 
two signatu s an a s own in ig. ( ) an ( ). Not t at t xt a t 

o spon ing u v s a ai y a u at . n t is xp im nt w av os n 

3 0 o omputing inva iant signatu s. 

7.3 Extracting Symmetry Axes 

n xt app y t quasi inva iants o xt a ting t symm t y ax s o t 
im nsiona o j ts. xt a ting symm t y 9 10 2 o o j ts in imag s is v y 
impo tant o ognising o j ts 1 29 o using att ntion 21 an ont o ing 

o ots 3 ia y. t is w known t at t o spon ing ontou u v s o a 

p ana i at a symm t y an si y sp ia a n t ans o mations 12 
2 . n t is s tion w onsi a ass o symm t y w i is si y a 
g n a a n t ans o mation. 

onsi a p ana o j t to av i at a symm t y wit an axis . Sup- 
pos t p ana o j t an s pa at into two p an s at t axis an is 

onn t y a ing so t at two p an s an otat a oun t is axis as s own 

in ig. (a). o j ts iv y otating t s two p an s av a 3 i at- 

a symm t y. is ass o symm t y is a so ommon in a ti ia an natu a 

o j ts su as utt fli s an ot flying ins ts. Sin t isto tion in imag s 
aus y a t im nsiona motion o a p ana o j t an si y a 




1- nv 1 nt 



m t 1 tion 



( ) vi wpoint 1 



( ) vi wpoint 2 




20 30 40 

Affine quasi-invariant arc-iength 

( ) inv i nt ign tu 





( ) m t h 


U V 




( ) m t h 


U V 


Fig. 7. 


u V mat ing 


xp im 


nt. 


mag s 0 natu a av s om t st an 


t s 


on vi wpoints a 


s own in (a 


) an ( ). w it 


in s in t s imag s 


s ow 


xt a t ontou 


U V s. 




quasi-inva iant a - 


ngt an s mi- o a 


inva iants a omput 


om t 


u 


v s in (a) an ( ) 


an a s own in ( ) 


y so i 


an as in s 


. ( ) an 


( ) 


s ow t 0 spon 


ing u v s xt a t 


om t 


inva iant signatu s ( ). 

















un to n o to ipoll 




Rotation 



( ) 




Fig. 8. i at a symm t y wit otation. t an t ig t pa ts o an 

o j t wit i at a symm t y a otat wit sp t to t symm t y axis 

in (a). int s tion point i o two tang nt in s an l\ at o spon ing 

points 1 an i o a i at a symm t y wit otation i s on t symm t y 

ax s in ( ). w av oss points i{i 1, , ) t symm t y axis 

an omput y tting a in to t s oss points 12 n- 



g n a a n t ans o mation t is ass o symm t y an a so si y 

g n a a n t ans o mations un t w ak p sp tiv assumption. us t 
o spon ing two u v s o t is symm t y av t sam inva iant signatu s 

un g n a a n t ans o mations. 

n xt s ow t su ts o xt a ting symm t y ax s o 3 i at a sym- 
m t y. ig. 9 (a) s ows an imag o a utt fly {Small White) wit a flow . 

Sin t two wings o t utt fly a not op ana t o spon ing on- 

tou u V s o t two wings a at yagnaant ans o mation as 

si a ov . ig. 9 ( ) s ows xamp ontou u v s xt a t om (a). 

Not t at not a t points on t u v s av o spon n s aus o t 

a k o g ata an t p s n o spu ious g s. so i an as in s 
in ig. 9 ( ) s ow t inva iant signatu s omput om t t an t ig t 
wings in ( ) sp tiv y. ( n t is xamp w os ' 0 o omputing 

s mi- o a inva iants.) Sin t signatu s a inva iant up to a s i t w av 

simp y fl t an s i t on inva iant signatu o izonta y minimising t 

tota in tw n two signatu s 

s s own in t s signatu s s mi- o a inva iants as on quasi-inva iant 

pa am t isation a quit a u at an sta . o spon ing points a iv 
y taking t i nti a points on t s two signatu s an s own in ig. 9 ( ) 
y onn ting t o spon ing points, ang nt in s at v y o spon ing 

pai o points a omput an isp ay in ig. 9 ( ) y w it in s. oss 

points o V y pai o tang nt in s a xt a t an s own in ig. 9 ( ) y 

squa ots. symm t y axis o t utt fly is xt a t y tting a in to 
t oss points o tang nt in s an s own in ig. 9 ( ). t oug t xt a t 
ontou u V s in u asymm t i pa ts as s own in ( ) t omput axis 




u i- nv i nt 



m t i tion n h i ppli tion 



9 




( ) o igin 1 im g 



( ) ontou u V 





( ) t ng nt lin ( ) ymm t y xi 



Fig. 9. xt a tion o axis o i at a symm t y wit otation. (a) s ows t 
o igina imag o a utt fly (Sma it ) p on a flow . ( ) s ows an 

xamp o ontou u v s. inva iant signatu sot s uvsa om- 

put om t quasi-inva iant a - ngt an s mi- o a inva iants. ( ) s ows 

t xt a t inva iant signatu sot tant igtuvsin(). 

a k in s in ( ) onn t pai so o spon ing points xt a t om t in- 
va iant signatu s in ( ). w it in s an t squa ots s ow t tang nt 
in s o t o spon ing points an t i oss points. w it in in ( ) 
s ows t symm t y axis o t utt fly xt a t y tting a in to t oss 

points. 





90 



un to n o to ipoll 



o symm t y ag s wit t o y o t utt fly quit w . as pu y 

g o a m t o s .g. mom nt as m t o s 9 10 wou not wo k in su 
as s. s su ts s ow t pow an us u n ss o t p opos s mi- o a 
inva iants an quasi-inva iant pa am t isation. 

8 Discussion 

n t is pap w av s own t at t xist quasi-inva iant pa am t isations 
w i a not xa t y inva iant ut app oximat y inva iant un g oup t ans- 

o mations an o not qui ig o ivativ s. an quasi-inva iant 

pa am t isation is iv an app i o mat ing o u v s un t w ak 
p sp tiv assumption. 

t oug t ang o t ans o mations is imit t p opos m t o is us - 

u o many as s sp ia y o u v mat ing un ativ motion tw n a 
vi w an o j ts sin t mov m nts o a am a an o j ts a in g n a 
imit . now is uss t p op ti s o t p opos pa am t isation. 

1. Noise Sensitivity 

Sin quasi-inva iant pa am t s na us to u t o o ivativ s 
qui t y a mu ss s nsitiv to nois t an xa t inva iant pa am - 
t s. us using t quasi-inva iant pa am t isation is t sam as n ing 

t st t a o tw n t syst mati o aus y t app oximation 

an t o aus y t nois . iv pa am t s a mo asi 

t an t a itiona inva iant pa am t s. 

2. Limitation of the Amount of Motion 

p opos quasi-inva iant pa am t assum s t g oup motion to 
imit toasma amount, nt an as t is imitation is a out i 0 1 
3 01 an 4 Olot ivgnant o mation ompon nts 

(t is no imitation on t u ompon nt 2 )- Sin in many omput 

vision app i ations t isto tion o t imag is sma u to t imit 

sp o t ativ motion tw n a am a an t snot nit 
istan tw n two am as in a st o syst m w i v t p opos 
pa am t isation an xp oit in many app i ations. 

References 

1 tt n yton n 1 m tho o t mining p oj tiv in- 

v i nt in im g y Computer Vision, Graphics and Image Processing ol 53 
No 1 pp 5 1991 

2 in o n L vitt u i-inv i nt h o y n xploit tion n Proc. 

DARPA Image Understanding Workshop pp 19 29 1993 

3 Ik ymm tythoyopln g p International Journal of Robotics 

Research ol 1 No 5 pp 25 1995 

u k t in olt N N t V li n i h on nv i nt ig- 

n tu o pi n h p ognition un p ti 1 o In ion Computer Vision, 

Graphics and Image Processing ol 5 No 1 pp 9 5 1993 




u i- nv i nt 



m t i tion n h i ppli tion 



91 



5 



9 

10 

11 

12 



13 

1 

15 

1 

1 

1 

19 

20 
21 
22 
23 



h m n ipoll utom t - plin u v p nt tion with L- 

tiv ontou n Proc. 7th British Machine Vision Conference ol 2 



PP 


3 3 3 2 in u 


gh 


pt m 


199 








ipoll n 


1 k 


u 


0 i nt 


tion 


n tim to ont t om im g 


iv 


g n n 0 m tion 


n 


n ini 


ito 


Proc. 2nd European Conference 


on 


Computer Vision 


PP 1 


202 


nt 


gh 


it t ly 1992 p ing 1 g 




yg n ki 




ott 


n 


c 


1 on n n t n 0 m tion 



inv i nt u V tu un tion n Proc. 1st International Conference on Computer 
Vision pp 9 500 Lon on 19 

o yth L un y i m n n othw 11 ogni ing ot - 

tion lly ymm t i u om th i outlin n n ini ito Proc. 

2nd European Conference on Computer Vision nt gh it t ly 1992 

p ing 1 g 

i g in ing x o k w ymm t y Computer Vision, Graphics 
and Image Processing ol 3 pp 13 155 19 

o n oult n lyzing k w ymm t i International Journal 

of Computer Vision ol 13 No 1 pp 91 111 199 



u 


i u 1 p tt 


n ognition 


y mom nt inv 


i nt 


IRE Transaction on 


Information Theory 


ol - pp 1 


9 1 u 


y 19 2 




K n 


n 


K n pping im g p op 


ti into h p on t int 


k w 


ymm t y 


n -t n 0 m 


1 p tt n 


n th 


h p - om-t xtu 


p igm 


n 


k t 1 ito 


Human and Machine 


Vision pp 23 25 


mi 


N 


19 3 








Ko n 


ink n 


V n 00 n 


om t y 0 


ino ul 


vi ion n mo 1 



0 t op i Biological Cybernetics ol 21 pp 29 35 19 

Li Gesammelte Abhandlungen ol u n L ipzig 192 

oh n n N V ti ptu 1 o g niz tion o n gm nt tion n 

iption IEEE Trans. Pattern Analysis and Machine Intelligence ol 1 
No pp 1 35 1992 

oon uw 1 L n ool n o t lin k oun tion o mi- 

1 nti 1 inv i nt International Journal of Computer Vision ol 1 pp 

25 1995 

L un y n i m n Geometric Invariance in Computer Vision 

m i g 1992 

Iv Applications of Lie Groups to Differential Equations p ing - 1 g 

19 



Iv pi 0 n 




nn n um 


i nti 


1 inv i nt ign tu n 


flow in omput vi ion 


n 




t 


om ny 


ito Geometry-Driven 


Diffusion in Computer Vision 


PP 


255 30 


Kluw 


mi u li h 199 


uw 1 oon 


L 




n ool 


K mp n 


n 0 t lin k 


ognition o pi n h 


P 


un 


n 


i to tion 


International Journal of 


Computer Vision ol 1 


PP 


9 


5 1995 






i 1 ol on 


n 




hu un 


ont X- 


tt ntion op to h 


g n liz ymm t y t 


n 0 


m 


International Journal of Computer Vision 



ol 1 No 2 pp 119 130 1995 

othw 11 i m n o yth n L unylnojt og- 

nition u ing p oj tiv h p p nt tion International Journal of Computer 
Vision ol 1 pp 5 99 1995 

to n ipoll n int g 1 inv i nt o xt ting ymm t y x 

n Proc. 7th British Machine Vision Conference ol 1 pp 3 2 in u gh 

pt m 199 




92 



un to n o to ipoll 



2 to n ipoll u i-inv i nt p m t i tion n m t hing o u v 

in im g International Journal of Computer Vision ol 2 No 2 pp 11 13 

199 

25 tting n L v Lie groups and algebras with applieations to 

physies, geometry and meehanies p ing - IgNw ok 19 



2 u in n oop j t ognition on mom nt (o Ig i ) 

inv i nt n L un y n i m n ito Geometrie Invarianee in 

Computer Vision pp 3 5 39 1992 



2 


L n ool oon 


uw 1 n 


o t lin k mi- i 


nt i 1 inv i- 




nt n L un y n 


i m n ito 


Geometrie Invariance 


in Computer 




Vision pp 15 192 


1992 






2 


L n ool oon 


ngu nu n 


0 t lin k h h 


t iz t ion 



n t tion o k w ymm t y Computer Vision and Image Understanding 
ol 1 No 1 pp 13 150 1995 

29 L n ool oon ngu nu n uw 1 ymm t y om h p 

n h p om ymm t y International Journal of Roboties Researeh ol 1 
No 5 pp 0 2 1995 

30 i oj tiv inv i nt o h p n Proe. Image Understanding workshop 

ol 2 pp 1125 113 19 

31 oug n N V ti h - im n ion 1 iption on th n ly i 

o th inv i nt n u i-inv i nt p op ti o om u v - xi g n liz 

ylin IEEE Trans. Pattern Analysis and Maehine Intelligenee ol 1 No 3 
pp 23 253 199 

32 i m n o yth L un y n othw 11 ognizing g n 1 

u V o j t i ntly n L un y n i m n ito Geometrie 

Invarianee in Computer Vision pp 22 251 1992 




Representations for Recognition 
Under Variable Illumination* 



David J. Kriegman^, Peter N. Belhumeur^, and Athinodoros S. Georghiades^ 

^ Beckman Institute and Department of Computer Science 
University of Illinois at Urbana-Champaign 
405 N. Mathews Avenue 
Urbana, IL 61801, USA 

^ Center for Computational Vision and Control 
Department of Electrical Engineering 
Yale University 

New Haven, CT 06520-8267, USA 



Abstract. Due to illumination variability, the same object can appear 
dramatically different even when viewed in fixed pose. Consequently, 
an object recognition system must employ a representation that is ei- 
ther invariant to, or models this variability. This chapter presents an 
appearance-based method for modeling this variability. In particular, we 
prove that the set of n-pixel monochrome images of a convex object with 
a Lambertian reflectance function, illuminated by an arbitrary number 
of point light sources at infinity, forms a convex polyhedral cone in IR" 
and that the dimension of this illumination cone equals the number of 
distinct surface normals. For a non-convex object with a more general 
reflectance function, the set of images is also a convex cone. Geometric 
properties of these cones for monochrome and color cameras are con- 
sidered. Here, present a method for constructing a cone representation 
from a small number of images when the surface is continuous, possibly 
non-convex, and Lambertian; this accounts for both attached and cast 
shadows. For a collection of objects, each object is represented by a cone, 
and recognition is performed through nearest neighbor classification by 
measuring the minimal distance of an image to each cone. We demon- 
strate the utility of this approach to the problem of face recognition (a 
class of non-convex and non-Lambertian objects with similar geometry). 
The method is tested on a database of 660 images of 10 faces, and the 
results exceed those of popular existing methods. 



1 Introduction 

One of the complications that has troubled computer vision recognition al- 
gorithms is the variability of an object’s appearance from one image to the 

* D. J. Kriegman and A.S. Georghiades were supported under NSF NYI IRI-9257990 
and ARC DAAG55-98-1-0168. P. N. Belhumeur was supported by a Presidential 
Early Career Award, an NSF Career Award IRI-9703134, and ARO grant DAAH04- 
95-1-0494. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 95—131, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




96 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



next. With slight changes in lighting conditions and viewpoint often come large 
changes in the object’s appearance. To handle this variability methods usually 
take one of two approaches: either measure some property in the image of the 
object which is, if not invariant, at least insensitive to the variability in the imag- 
ing conditions, or model the object, or part of the object, in order to predict the 
variability. 

Nearly all approaches to object recognition have handled the variability due 
to illumination by using the first approach; they have, for example, concentrated 
on edges, i.e. the discontinuities in the image intensity. Because discontinuities in 
the albedo on the object’s surface or discontinuities in albedo across the object’s 
boundary generate edges in images, these edges tend to be insensitive to a range 
of illumination conditions [5] . 

Yet, edges do not contain all of the information useful for recognition. Fur- 
thermore, objects which are not simple polyhedra or are not composed of piece- 
wise constant albedo patterns often produce inconsistent edge maps. The top of 
Fig. 1 shows two images of a person with the same facial expression and pho- 
tographed from the same viewpoint. The variability in these two images due to 
differences in illumination is dramatic: not only does it lead to a change in con- 
trast, but also to changes in the configuration of the shadows, i.e. certain regions 
are shadowed in the left image, but illuminated in the right, and vice versa. The 
edge maps in the bottom half of Fig. 1 are produced from these images. Due 
to the variation in illumination, only a small fraction of the edges are common 
between images. Figure 9 shows another example of extreme illumination vari- 
ation in images; in this case observe the extreme variability in the images of a 
single individual illuminated by a single light source in different locations. 

The reason most approaches have avoided using the rest of the intensity 
information is because its variability under changing illumination has been diffi- 
cult to tame. Methods have recently been introduced which use low-dimensional 
representations of images to perform recognition, see for example [15, 27, 39]. 
These methods, often termed appearance-based methods, differ from feature- 
based methods in that their low-dimensional representation is, in a least-squared 
sense, faithful to the original image. Systems such as SLAM [27] and Eigenfaces 
[39] have demonstrated the power of appearance-based methods both in ease of 
implementation and in accuracy. Yet these methods suffer from an important 
drawback: recognition of an object (or face) under a particular pose and lighting 
can be performed reliably provided that object has been previously seen under 
similar circumstances. In other words, these methods in their original form have 
no way of extrapolating to novel viewing conditions. Yet, if one enumerates 
all possible poses and permutes these with all possible illumination conditions, 
things get out of hand quite quickly. This raises the question: Is there some un- 
derlying “generative” structure to the set of images of an object under varying 
illumination and pose such that to create the set, the object does not have to be 
viewed under all possible conditions? 

In this chapter we address only part of this question, restricting our investi- 
gation to varying illumination. In particular, if an image with n pixels is treated 




Representations for Recognition Under Variable Illumination 



97 




Original Images 




Edge Maps 



Fig. 1. Effects of Variability in Illumination: The top two images show the 
same face seen under different illumination conditions. The bottom two images 
show edge maps of the top two images. Even though the change in light source 
direction is less than 45°, the change in the resulting image is dramatic. 



as a point in IR", what is the set of all images of an object under varying il- 
lumination? Is this set an incredibly complex, but low-dimensional manifold in 
the image space? Or does the set have a simple, predictable structure? If the 
object is convex in shape and has a Lambertian reflectance function, can a finite 
number of images characterize this set? If so, how many images are needed? 

The image formation process for a particular object can be viewed as a func- 
tion of pose and lighting. Since an object’s pose can be represented by a point in 
IR^ X (a six dimensional manifold), the set of n-pixel images of an object 

under constant illumination, but over all possible poses, is at most six dimen- 
sional. Murase and Nayar take advantage of this structure when constructing 
appearance manifolds [27]. However, the variability due to illumination may be 
much larger as the set of possible lighting conditions is infinite dimensional. 

Arbitrary illumination can be modeled as a scalar function on a four dimen- 
sional manifold of light rays [25]. However, without limiting assumptions about 
the possible light sources, the bidirectional reflectance density functions, or ob- 






98 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



ject geometry, it is difficult to draw limiting conclusions about the set of images. 
For example, the image of a perfect mirror can be anything. Alternatively, if the 
light source is composed of a collection of independent lasers with one per pixel 
(which is admissible under [25]), then an arbitrary image of any object can be 
constructed by appropriately selecting the lasers’ intensities. 

Nonetheless, we will show that the set of images of an object with arbitrary 
reflectance functions seen under arbitrary illumination conditions is a convex 
cone in IR" where n is the number of pixels in each image. Furthermore, if 
the object has a convex shape and a Lambertian reflectance function and is 
illuminated by an arbitrary number of point light sources at infinity, this cone is 
polyhedral and can be determined from as few as three images. In addition, we 
will show that while the dimension of the illumination cone equals the number 
of distinct surface normals, the shape of the cone is “flat,” i.e. the cone lies near 
a low dimensional linear subspace of the image space. When the object is non- 
convex and non-Lambertian, methods for approximating the cone are presented. 
Throughout the chapter, empirical investigations are presented to complement 
the theoretical arguments. In particular, experimental results are provided which 
support the validity of the illumination cone representation and the associated 
propositions on the illumination cone’s dimension and shape. Note that some 
results in this chapter were originally presented in [4, 14]. 

The effectiveness of these algorithms and the cone representation is demon- 
strated within the context of face recognition - it has been observed by Moses, 
Adini and Ullman that the variability in an image due to illumination is often 
greater than that due to a change in the person’s identity [26]. Figure 9 shows the 
variability for a single individual. It has also been observed that methods for face 
recognition based on finding local image features and using their geometric rela- 
tion are generally ineffective [6]. Hence, faces provide an interesting and useful 
class of objects for testing the power of the illumination cone representation. 

In this chapter we empirically compare this new method to a number of 
popular techniques and representations such as correlation [6] and Eigenfaces [24, 
39] as well as more recently developed techniques such as distance to linear 
subspace [3, 15, 29, 34]; the latter technique has been shown to be much less 
sensitive to illumination variation than the former. However, these methods also 
break down as shadowing becomes very significant. As we will see, the presented 
algorithm based on the illumination cone outperforms all of these methods on a 
database of 660 images. It should be noted that our objective in this work is to 
focus solely on the issue of illumination variation whereas other approaches have 
been more concerned with issues related to large image databases, face finding, 
pose, and facial expressions. 

We are hopeful that the proposed illumination representation will prove use- 
ful for 3-D object recognition under more general conditions. For problems where 
pose is unknown, we envision marrying the illumination cone representation with 
a low-dimensional set of image coordinate transformations [40] or with the ap- 
pearance manifold work of [27], thus allowing both illumination and pose varia- 
tion. For problems in which occlusion and non-rigid motion cannot be discounted. 




Representations for Recognition Under Variable Illumination 



99 



we envision breaking the image of an object into sub-regions and building “illu- 
mination sub-cones.” These illumination sub-cones could then be glued together 
in a manner similar to the recent “body plans” work of [12]. 

2 The Illumination Cone 

In this section, we develop the illumination cone representation. To start, we 
make two simplifying assumptions: first, we assume that the surfaces of objects 
have Lambertian reflectance functions; second, we assume that the shape of 
an object’s surface is convex. While the majority of the propositions are based 
upon these two assumptions, we will relax them in Section 2.2 and show that the 
set of images is still a convex cone. In addition, the empirical investigations of 
Section 2.4 will demonstrate the validity of the illumination cone representation 
by presenting results on images of objects which have neither purely Lambertian 
reflectance functions nor convex shapes. The cone representation will then be 
used for face recognition in Section 5. 



2.1 Illumination Cones for Convex Lambertian Surfaces 

To begin, let us assume a Lambertian model for reflectance with a single point 
light source at infinity. Let x denote an image with n pixels. Let B e 
be a matrix where each row of B is the product of the albedo with the inward 
pointing unit normal for a point on the surface projecting to a particular pixel; 
here we effectively approximate a smooth surface by a faceted one and assume 
that the surface normals for the set of points projecting to the same image pixel 
are identical. 

Let s e IR^ be a column vector signifying the product of the light source 
strength with the unit vector for the light source direction. Thus, a convex 
object with surface normals and albedo given by B, seen under illumination s, 
produces an image x given by the following equation 

X = max(i3s, 0), (1) 

where max(-,0) zeros all negative components of the vector Bs [18]. Note that 
the negative components of Bs correspond to the shadowed surface points and 
are sometimes called attached shadows [33]. Also, note that we have assumed that 
the object’s shape is convex at this point to avoid cast shadows, i.e. shadows that 
the object casts on itself. 

If the object is illuminated by k point light sources at infinity, the image x 
is given by the superposition of images which would have been produced by the 
individual light source, i.e. 



k 

X = max(Rsi, 0) 

i=l 




100 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



where is a single light source. Note that extended light sources at infinity can 
be handled by allowing an infinite number of point light sources (i.e., the sum 
becomes an integral). 

The product of B with all possible light source directions and strengths 
sweeps out a subspace in the n-dimensional image space [17, 29, 33]; we call the 
subspace created by B the illumination subspace C, where 

£ = {x I X = Bs,Vs e IR^}. 

Note that the dimension of £ equals the rank of B. Since Z? is an n x 3 matrix, £ 
will in general be a 3-D subspace, and we will assume it to be so in the remainder 
of the chapter. When the surface has fewer than three linearly independent sur- 
face normals, B does not have full rank. For example, in the case of a cylindrical 
object, both the rank of B and dimension of £ are two. Likewise, in the case of 
a planar object, both the rank and dimension are one. 

When a single light source is parallel with the camera’s optical axis, all 
visible points on the surface are illuminated, and consequently, all pixels in the 
image have non-zero values. The set of images created by scaling the light source 
strength and moving the light source away from the direction of the camera’s 
optical axis such that all pixels remain illuminated can be found as the relative 
interior of a set £o defined by the intersection of £ with the non- negative orthant^ 
of M”. 

Lemma 1. The set of images £q is a convex cone in IR”. 

Proof. £q = £ n {x I X G IR", with all components of x > 0}. Both £ and the 
positive orthant are convex. For the definition of convexity and the definition 
of a cone, see [7, 31]. Because the intersection of two convex sets is convex, it 
follows that £o is convex. 

Because £ is a linear subspace, if x G £ then ax G £. And, if x has all 
components non-negative, then ax has all components non-negative for every 
a > 0. Therefore ax G £q. So it follows that £q is a cone. 

As we move the light source direction further from the camera’s optical axis, 
points on the object will fall into shadow. Naturally, which pixels are the image 
of shadowed or illuminated surface points depends on where we move the light 
source direction. If we move the light source all the way around to the back of 
the object so that the camera’s optical axis and the light source are pointing in 
opposite directions, then all pixels are in shadow. 

Let us now consider all possible light source directions, representing each 
direction by a point on the surface of the sphere; we call this sphere the illumi- 
nation sphere. For a convex object, the set of light source directions for which a 
given facet (i.e. pixel in the image) is illuminated corresponds to an open hemi- 
sphere of the illumination sphere; the set of light source directions for which the 

^ By orthant we mean the high-dimensional analogue to quadrant, i.e., the set {x | x G 
IR", with certain components of x > 0 and the remaining components of x < 0}. By 
non-negative orthant we mean the set {x | x G IR", with all components of x > 0}. 




Representations for Recognition Under Variable Illumination 



101 





Fig. 2. The Illumination Sphere: The set of all light source directions can be 
represented by points on the surface of a sphere; we call this sphere the illumina- 
tion sphere. Great circles corresponding to individual pixels divide the illumina- 
tion sphere into cells of different shadowing configurations. The arrows indicate 
the hemisphere of light directions for which the particular pixel is illuminated. 
The cell of light source directions which illuminate all pixels is denoted by iSq. 
The light source directions within <So produce £q the set of images in which all 
pixels are illuminated. Each of the other cells produce the Li, 0 < i < n(n— 1) + 1. 
The extreme rays of the cone are given by the images produced by light sources 
at the intersection of two circles. 



facet is shadowed corresponds to the other hemisphere of points. A great circle 
on the illumination sphere divides these sets. 

For each of the n pixels in the image, there is a corresponding great circle 
on the illumination sphere. The collection of great circles carves up the surface 
of the illumination sphere into a collection of cells Si. See Figure 2. The col- 
lection of light source directions contained within a cell Si on the illumination 
sphere produces a set of images, each with the same pixels in shadow and the 
same pixels illuminated; we say that these images have the same “shadowing 
conhgurations.” Different cells produce different shadowing conhgurations. Xote 
that this partitioning is reminiscent of the partitioning of the viewpoint space in 
the construction of orthographic projection aspect graphs of convex polyhedral 
objects [41]. 

We denote by Sq the cell on the illumination sphere containing the collection 
of light source directions which produce images with all pixels illuminated. Thus, 




102 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



the collection of light source directions from the interior and boundary of Sq 
produces the set of images Lq. To determine the set of images produced by 
another cell on the illumination sphere, we need to return to the illumination 
subspace £. 

The illumination subspace C not only slices through the non-negative orthant 
of IR”, but other orthants in IR" as well. Let Ci be the intersection of the 
illumination subspace £ with an orthant i in IR” through which £ passes. Certain 
components of x e £^ are always negative and others always greater than or 
equal to zero. Each Ci has a corresponding cell of light source directions Si on 
the illumination sphere. Note that £ does not slice through all of the 2” orthants 
in IR", but at most n{n — 1) + 2 orthants (see the proof of Proposition 1). Thus, 
there are at most n{n — 1) + 2 sets Ci, each with a corresponding cell Si on the 
illumination sphere. 

The set of images produced by the collection of light source directions from a 
cell Si other than Sq can be found as a projection Pi of all points in a particular 
set Ci. The projection Pi is such that it leaves the non- negative components of 
X e £i untouched, while the negative components of x become zero. We denote 
the projected set by Pi{Ci). 

Lemma 2. The set of images Pi{Ci) is a convex cone in IR”. 

Proof. By the same argument used in the proof of Lemma 1, £^ is a convex cone. 
Since the linear projection of a convex cone is itself a convex cone, Pi{Ci) is a 
convex cone. 

Since Pi{Ci) is the projection of Ci, it is at most three dimensional. Each 
Pi{Ci) is the set of all images such that certain facets are illuminated, and the 
remaining facets are shadowed. The dual relation between Pi{Ci) and Si can be 
concisely written as Pi{Ci) = {a max(i?s, 0) : a > 0,s G Si} and = {s : |s| = 

1, max(Bs, 0) G Pi{Ci)}. Let Pq be the identity, so that £b(£o) = £o is the set of 
all images such that all facets are illuminated. The number of possible shadowing 
configurations is the number of orthants in IR" through which the illumination 
subspace £ passes, which in turn is the same as the number of sets Pi{Ci). 

Proposition 1. The number of shadowing configurations is at most m(m — 1) + 

2, where m < n is the number of distinct surface normals. 

Proof. Each of the n pixels in the image has a corresponding great circle on 
the illumination sphere, but only m < n of the great circles are distinct. The 
collection of m distinct great circles carves the surface of the illumination sphere 
into cells. Each cell on the illumination sphere corresponds to a particular set 
of images PfCi). Thus, the problem of determining the number of shadowing 
configurations is the same as the problem of determining the number of cells. 
If every vertex on the illumination sphere is formed by the intersection of only 
two of the m distinct great circles (i.e., if no more than two surface normals 
are coplanar), then it can be shown by induction that the illumination sphere is 
divided into m(m — 1) + 2 cells. If a vertex is formed by the intersection of three 
or more great circles, there are fewer cells. 




Representations for Recognition Under Variable Illumination 



103 



Thus, the set lA of images of a convex Lambertian surface created by varying 
the direction and strength of a single point light source at infinity is given by 
the union of at most n{n — 1) + 2 convex cones, i.e., 

= {x I X = max(Bs, 0), Vs G IR^} 

n(n — 1) + 1 

= U (2) 

i=0 

From this set, we can construct the set C of all possible images of a convex 
Lambertian surface created by varying the direction and strength of an arbitrary 
number of point light sources at infinity, 

k 

C = {x : X =y~]max(j3si,0),Vsj e lR^,V/c G 
1=1 

where is the set of positive integers. 

Proposition 2. The set of images C is a convex cone in IR”. 

Proof. The proof that C is a cone follows trivially from the definition of C. To 
prove that C is convex, we appeal to a proposition for convex cones which states 
that a cone C is convex iff xi + X2 G C for any two points xi, X2 G C [7]. So the 
proof that C is convex also follows trivially from the above definition of C. 

We call C the illumination cone. Every object has its own illumination cone. 
Note that each point in the cone is an image of the object under a particular 
lighting configuration, and the entire cone is the set of images of the object under 
all possible configurations of point light sources at infinity. 

Proposition 3. The illumination cone C of a convex Lambertian surface can 
be determined from as few as three images, each taken under a different, but 
unknown light source direction. 

Proof. The illumination cone C is completely determined by the illumination 
subspace C. If the matrix of surface normals scaled by albedo B were known, 
then this would determine C uniquely, as = {x | x = Rs,Vs G M^}. Yet, from 
images produced by differing, but unknown light source directions, we can not 
determine B uniquely. To see this, note that for any arbitrary invertible 3x3 
linear transformation A G GL{3), 

Bs = {BA){A-^s) = B*s*. 

In other words, the same image is produced when the albedo and surface normals 
are transformed by A, while the light source is transformed by A^^. Therefore, 
without knowledge of the light source directions, we can only recover B* where 
B* = BA, see [10, 17]. Nonetheless B* is sufficient for determining the subspace 
C: it is easy to show that £ = {x | x = B*s,\/s G IR^} = {x | x = Rs,Vs G IR^}, 
see [33] . 




104 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



Thus, for a convex object with Lambertian reflectance, we can determine 
its appearance under arbitrary illumination from as few as three images of the 
object - knowledge of the light source strength or direction is not needed, see 
also [33]. To determine the illumination cone C, we simply need to determine the 
illumination subspace L. In turn, we can choose any three images from the set 
£o, each taken under a different lighting direction, as its basis vectors. Naturally, 
if more images are available, they can be combined to find the best rank three 
approximation to L using singular value decomposition (SVD). 

We should point out that for many convex surfaces the cone can be con- 
structed from as few as three images; however, this is not always possible. If 
the object has surface normals covering the Gauss sphere, then there is only 
one light source direction - the viewing direction - such that the entire visible 
surface is illuminated. For any other light source direction, some portion of the 
surface is shadowed. To determine C, each point on the surface of the object must 
be illuminated in at least three images; for this to be true over the entire visible 
surface, as many as live images may be required. See [20] and Section 5.1 for 
algorithms for determining L from images with shadowed pixels. 

What may not be immediately obvious is that any point within the cone C 
(including the boundary points) can be found as a convex combination of the rays 
(images) produced by light source directions lying at the m(m — 1) intersections 
of the great circles on the illumination sphere. Each of these m(m — 1) rays 
(images) is an extreme ray of the convex cone, because it cannot be expressed 
as a convex combination of two other images in the cone. Furthermore, because 
the cone is constructed from a finite number of extreme rays (images), the cone 
is polyhedral. 

These propositions and observations suggest the following algorithm for con- 
structing the illumination cone from three or more images: 



Illumination Subspace Method: Gather images of the object under 
varying illumination without shadowing and use these images to esti- 
mate the three-dimensional illumination subspace £. After normalizing 
the images to be of unit length, singular value decomposition (SVD) 
can be used to estimate the best orthogonal basis in a least squares 
sense. From the illumination subspace £, the extreme rays defining the 
illumination cone C are then computed. Recall that an extreme ray is 
an image created by a light source direction lying at the intersection of 
two or more great circles. If there are m independent surface normals, 
there can be as many as m(m — 1) extreme rays (images). Let and 
bj be rows of B with i j, the extreme rays are given by 

Xij = max(Rsij, 0) (3) 

where 

Sij=biXbj. (4) 




Representations for Recognition Under Variable Illumination 



105 



In Section 2.4, we use this method to experiment with images of real objects; 
we use a small number of images to build the illumination subspace £ and 
then produce sample images from the illumination cone C. To reduce storage 
and computational requirements for applications using the cone, the images can 
be projected down to a low dimensional subspace; any image in the projected 
cone can be found as convex combinations of the projected extreme rays. Note 
however, that some of the projected extreme rays are redundant since an extreme 
ray may project to the interior of the projected cone. As will be seen in the 
experiments of Section 3.4, the illumination cones of real objects do lie near 
a low dimensional subspace; thus dimensionality reduction by linear projection 
may be justified. 



A Two-Dimensional Example To illustrate the relationship between an ob- 
ject and its illumination cone, consider the simplified 2-D example in Fig. 3. An 
object composed of three facets is shown in Fig. 3. a. For facet i, the product 
of the albedo and surface normal is given by G In this 2-D world, the 
direction of a light source at infinity can be represented as a point on a circle. 

Let us now consider a camera observing the three facets from above such 
that each facet projects to one pixel yielding an image x = {xi,X 2 ,xsY G IR^. 
C is then a 2-D linear subspace of IR^, and the set of images from a single light 
source such that all pixels are illuminated £q £ is the 2-D convex cone shown 
in Figure 3.b. The left edge of Cq, where xs = 0, corresponds to the light source 
direction where Facet 3 just goes into shadow, and similarly the right edge of 
£o) where xi = 0, corresponds to the light source direction where Facet 1 just 
goes in shadow. Now, for a single light source, the set of images is formed by 
projecting £ onto the positive orthant as shown in Figure 3.c. Note for example, 
that the 2-D cone £’i(£i) corresponds to the set of images in which Facets 1 and 
2 are illuminated while Facet 3 is in shadow, and the 1-D ray P^{C^) corresponds 
to the set of image with Facet 1 illuminated and Facets 2 and 3 shadowed. The 
union If^PYCi) defines the walls of the illumination cone C, and the entire 
cone is formed by taking convex combinations of images on the walls. 

As seen in Figure 3.d, the set of light source directions, represented here by 
a circle, can be partitioned into regions Si such that all images produced by 
light sources within a region have the same shadowing configurations. That is, 

= {s : |s| = l,max(ZIs,0) G Pi{Ci)}. The corresponding partitioning of light 
source directions is shown in Figure 3. a. 



2.2 Illumination Cones for Arbitrary Objects 

In the previous sub-section, we assumed that the objects were convex in shape 
and had Lambertian reflectance functions. The central result was that the set of 
images of the object under all possible illumination conditions formed a convex 
cone in the image space and that this illumination cone can be constructed from 
as few as three images. Yet, most objects are non- convex in shape and have 
reflectance functions which can be better approximated by more sophisticated 




106 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 




Fig. 3. A 2-D Example: a. A surface with three facets is observed from above 
and produces an image with pixels x\,X 2 and X 3 . b. The linear subspace C 
and its intersection with the positive quadrant Cq. c. The “walls” of the cone 
Pi{Ci) corresponding to images formed by a single light source. The illumination 
cone C is formed by all convex combinations of images lying on the walls, d. The 
geometry of facets leads to a partitioning of the illumination circle. 



physical [30, 36, 38] and phenomenological [22] models. The question again arises: 
What can we say about the set of images of an object with a non-convex shape 
and a non-Lambertian reflectance function? 

The proof of Proposition 2 required no assumptions about the shape of the 
object, the nature of the light sources, or the reflectance function for the object’s 
surface. Consequently, we can state a more general proposition about the set of 
images of an object under varying illumination: 







Representations for Recognition Under Variable Illumination 107 



Proposition 4. The set of n-pixel images of any object, seen under all possible 
lighting conditions, is a convex cone in M”. 

Therefore, even for a nonconvex object with a non-Lambertian reflectance func- 
tion, the set of images under all possible lighting conditions still forms convex 
cone in the image space. This result is in some sense trivial, arising from the 
superposition property of illumination: the image of an object produced by two 
light sources is simply the addition of the two images produced by the sources 
individually. 

It is doubtful that the illumination cone for such objects can be constructed 
from as few as three images. This is not due to the non-convexity of objects and 
the shadows they cast. The structure of objects with Lambertian reflectance, but 
non-convex shapes, can be recovered up to a “generalized bas-relief” transfor- 
mation from as few as three images [2]. From this, it is possible to determine the 
cast shadows exactly. Rather, the difficulty is due to the fact that the reflectance 
function is unknown. To determine the reflectance function exactly could take an 
infinite number of images. However, the Illumination Subspace Method devel- 
oped in Section 2.1 can be used to approximate the cone, as will be seen in the 
empirical investigation of Section 2.4; such an approximation for a non-convex, 
non-Lambertian surface is used in the face recognition experiment in Section 5. 
An alternative method for approximating the cone is presented below: 

Sampling Method: Illuminate the object by a series of light source 
directions which evenly sample the illumination sphere. The resulting 
set of images is then used as the set of extreme rays of the approximate 
cone. 

Note that this approximate cone is a subset of the true cone and so any image 
contained within the approximate cone is a valid image. The Sampling Method 
has its origins in the linear subspace method proposed by Hallinan [15]; yet, it 
differs in that the illumination cone restricts the images to be convex - not linear 
- combinations of the extreme rays. This method is a natural way of extending 
the appearance manifold method of Murase and Nayar to account for multiple 
light sources and shadowing [27]. 



2.3 Illumination Cones for Non-convex Lambertian Surfaces 

While the sampling method provides the means to approximate the cone for 
objects with aribtrary geometry and reflectance functions and illuminated by 
multiple light sources, it is necessary to have observed the object under many 
lighting conditions to obtain a good approximation. On the other hand, if the ob- 
ject is convex and Lambertian, the illumination subspace method can be used to 
construct the entire cone from only three images. Here we consider an intermedi- 
ate situation where the surface is Lambertian but non-convex. Most significantly. 




108 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



non-convex objects can cast shadows upon themselves. Whereas attached shad- 
ows are defined by a local condition (See Equation 1), cast shadows are global in 
nature. Nonetheless, from Section 2.2 we know that the set of images must still 
be a cone; here we show how an approximation of this cone can be constructed 
from as few as three images. 

The illumination subspace method suggests a starting point for constructing 
the illumination cone: gather three or more images of an object under varying 
illumination without shadowing and use these images to estimate the three- 
dimensional illumination subspace £. Note that the estimated basis B* differs 
from the true B (rows which are the surface normal scaled by the albedo) by an 
unknown linear transformation, i.e., B = B*A where A G GL{3); for any light 
source, Bs = (Z?A)(A^^s). Nonetheless, for a convex object, the extreme rays 
defining the illumination cone C can be computed using Equations 3 and 4 using 
B* . For a non-convex object, cast shadows can cover significant portions of the 
visible surface when the angle of the light source with respect to the viewing 
direction is large (extreme illumination) ; see the images from Subsets 4 and 5 in 
Fig. 9. Yet the image formation model (Eq. 1) used to develop the illumination 
cone in Section 2.1 does not account for cast shadows. 

It has been shown in [2, 42] and in this book that from multiple images 
where the light source directions are unknown, one can only recover a Lambertian 
surface up to a three-parameter family given by the generalized bas-relief (GBR) 
transformation. This family is a restriction on A, and it has the effect of scaling 
the relief (flattening or extruding) and introducing an additive plane. Since both 
shadows and shading are preserved under these transformation [2, 23], images 
synthesized from a surface whose normal field is given by B* under light source 
s*j will have correct shadowing. Thus, to construct the extreme rays of the cone, 
we first reconstruct a Lambertian surface (a height function plus albedo) from 
B*. This surface is not an approximation of the original surface, but rather a 
representative element of the orbit of the original surface under GBR. For a given 
light source direction s*, ray-tracing techniques can be used to determine which 
surface points lie in a cast shadow. Whereas for convex Lambertian objects, the 
illumination sphere is partitioned into m(m — 1) + 2 regions by m great circles, 
the illumination sphere will be partitioned by more complex curves for non- 
convex Lambertian objects, and so it is expected that there will be many more 
shadowing configurations. As such, it is unlikely that an exact representation of 
the cone could be used in practice. This approximate cone is a subset of the true 
cone when there is no imaging noise. 

These observations lead to the following steps for constructing an approxi- 
mation to the illumination cone of a non-convex Lambertian surface from a set 
of images taken under unknown lighting. 




Representations for Recognition Under Variable Illumination 



109 



Cast Shadow Method: 

1. Gather images of the object under varying illumination without 
shadowing. 

2. Estimate B* from training images. 

3. Reconstruct a surface up to GBR. 

4. For a set of light source directions that uniformly sample the sphere, 
use ray-tracing to synthesize images from the reconstructed surface 
that account for both cast and attached shadows. 

5. Use synthetic images as extreme rays of cone. 

More details of these steps as applied to face recognition will be provided in 
Section 5.1. 

2.4 An Empirical Investigation: Building Illumination Cones 

To demonstrate the power of these concepts, we have used the Illumination 
Subspace Method to construct the illumination cone for two different scenes: a 
human face and a desktop still life. To construct the cone for the human face, we 
used images from the Harvard Face Database [15], a collection of images of faces 
seen under a range of lighting directions. For the purpose of this demonstra- 
tion, we used the images of one person, taking six images with little shadowing 
and using singular value decomposition (SVD) to construct a 3-D basis for the 
illumination subspace C. Xotc that this 3-D linear subspacc differs from the 
affine subspace constructed using the Karhunen-Loeve transform: the mean im- 
age is not subtracted before determining the basis vectors as in the Eigenpicture 
methods [24, 39]. 

The illumination subspace was then used to construct the illumination cone 
C. We generated novel images of the face as if illuminated by one, two, or three 
point light sources by randomly sampling the illumination cone. Rather than con- 
structing an explicit representation of the half-spaces bounding the illumination 
cone, we sampled £, determined the corresponding orthant, and appropriately 
projected the image onto the illumination cone. Images constructed under multi- 
ple light sources simply correspond to the superposition of the images generated 
by each of the light sources. 

The top two rows of Fig. 4 show all six low resolution images of a person’s 
face that were used to construct the basis of the linear subspace £. The bottom 
row of Fig. 4 shows three basis images that span £. Each of the three columns 
of Fig. 5 respectively comprises of sample images from the illumination cone for 
the face with one, two, or three light sources. 

There is a number of points to note about this experiment. There was almost 
no shadowing in the training images yet there are strong attached shadows in 
many of the sample images. These are particularly distinct in the images gen- 
erated with a single light source. Notice for example the sharp shadow across 
the ridge of the nose in Column 1, Row 2 or the shadowing in Column 1, Row 4 
where the light source is coming from behind the head. Notice also the depres- 
sion under the cheekbones in Column 2, Row 5, and the cleft in the chin revealed 




no 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 




Original Images 




Basis Images 



Fig. 4. Illumination Subspace Method: The top two rows of the figure show 
all six of the original images used to construct the illumination subspace £ for 
the face. The bottom row of the figure shows three basis images, that span the 
illumination subspace £ for the face. 



in Column 1, Row 3. For the image in Column 3, Row 2, two of the light sources 
are on opposite sides while the third one is coming from below; notice that both 
ears and the bottom of the chin and nose are brightly illuminated while the rest 
of the face is darker. 

To construct the cone for the desktop still life, we used our own collection 
of nine images with little shadowing. The top row of Fig. 6 shows three of these 
images. The second row of Fig. 6 shows the three basis images that span £. Each 
of the lower three columns of Fig. 6 respectively comprises of sample images from 
the illumination cone for the desktop still life with one, two, or three light sources. 

The variability in illumination in these images is so extreme that the edge 
maps for these images would differ drastically. Notice in the image in Column 
1, Row 4 that the shadow line on the bottle is distinct and that the left sides 
of the phone, duck, and bottle are brightly illuminated. Throughout the scene, 
notice that those points having comparable surface normals seem to be similarly 





Representations for Recognition Under Variable Illumination 



111 




1 Light 



2 Lights 



3 Lights 



Fig. 5. Random Samples from the Illumination Cone of a Face: Each of 
the three columns respectively comprises of sample images from the illumination 
cone with one, two, or three light sources. 



illuminated. Furthermore, notice that all of the nearly horizontal surfaces in the 
bottom two images of the first column are in shadow since the light is coming 
from below. In the image with two light sources shown at the bottom of Column 





112 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 







1 Light 2 Lights 



3 Lights 



Fig. 6. Illumination Subspace Method: The top row of the figure shows 
three of the original nine images used to construct the illumination subspace 
£ for the still life. The second row shows the three basis images that span the 
illumination subspace £. Each of the lower three columns respectively comprises 
of sample images from the illumination cone with one, two, or three light sources. 









Representations for Recognition Under Variable Illumination 



113 



2, the sources are located on opposite sides and behind the objects. This leads 
to a shadow line in the center of the bottle. The head of the wooden duck shows 
a similar shadowing where its front and back are illuminated, but not the side. 

3 Dimension and Shape of the Illumination Cone 

In this section, we investigate the dimension of the illumination cone, and show 
that it is equal to the number of distinct surface normals. However, we conjecture 
that the shape of the cone is flat, with much of its volume concentrated near a 
low-dimensional subspace, and present empirical evidence to support this con- 
jecture. Finally, we show that the cones of two objects with the same geometry, 
but with separate albedo patterns, differ by a diagonal linear transformation. 



3.1 The Dimension of the Illumination Cone 

Given that the set of images of an object under variation in illumination is a 
convex cone, it is natural to ask: What is the dimension of the cone in IR"? 
By this we mean, what is the span of the vectors in the illumination cone C? 
Why do we want to know the answer to this question? Because the complexity 
of the cone, may dictate the nature of the recognition algorithm. For example, 
if the illumination cones are 1-D, i.e., rays in the positive orthant of IR", then a 
recognition scheme based on normalized correlation could handle all of the vari- 
ation due to illumination. However, in general the cones are not one dimensional 
unless the object is planar. To this end, we offer the following proposition. 

Proposition 5. The dimension of the illumination cone C is equal to the num- 
ber of distinct surface normals. 

Proof. As with the proof of Proposition 1, we again represent each light source 
direction by a point on the surface of the illumination sphere. Each cell on the 
illumination sphere corresponds to the light source directions which produce a 
particular set of images Pi{Ci). For every image in a set Pi{£i), certain pixels 
are always equal to zero, i.e., always in shadow. There exists a cell Sq on the 
illumination sphere corresponding to the light source directions which produce 
£o) the set of images in which all pixels are always illuminated. There exists a 
cell Sd corresponding to the light source directions which produce a set of images 
in which all pixels are always in shadow. Choose any point si, E Sq. The point 
Sd = — Sft is antipodal to S{, and lies within Sd. Draw any half- meridian connecting 
S{, and Sd. Starting at s^, follow the path of the half-meridian; it crosses m distinct 
great circles, and passes through m different cells before entering Sd. Note that 
the path of the half-meridian corresponds to a particular path of light source 
directions, starting from a light source direction producing an image in which all 
pixels are illuminated and ending at a light source direction producing an image 
in which all pixels are in shadow. Each time the half-meridian crosses a great 
circle, the pixel corresponding to the great circle becomes shadowed. 




114 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



Take an image produced from any light source direction within the interior 
of each cell through which the meridian passes, including Sq, but excluding Sd- 
Arrange each of these m images as column vectors in an n x m matrix M. By 
elementary row operations, the matrix M can be converted to its echelon form 
M*, and it is trivial to show that M* has exactly m non-zero rows. Thus, the 
rank of M is m, and the dimension of C is at least m. Since there are only 
m distinct surface normals, the dimension of C cannot exceed m. Thus, the 
dimension of C equals m. 

Note that for images with n pixels, this proposition indicates that the di- 
mension of the illumination cone is one for a planar object, is roughly -^Jn for a 
cylindrical object, and is n for a spherical object. But if the cone spans IR", what 
fraction of the positive orthant does it occupy? In Section 3.3, we investigate this 
question, conjecturing that the illumination cones for most objects occupy little 
volume in the image space. 

3.2 The Connection between Albedo and Cone Shape 

If two objects are similar in geometry, but differ in their respective albedo pat- 
terns, then there is a simple linear relationship between their corresponding il- 
lumination cones. Here, we consider two Lambertian objects that have the same 
underlying geometry, but have differing albedo patterns (e.g., a Coke can and a 
Pepsi can). In this case, the product of albedo and surface normals for the two 
objects can be expressed as Bi = RiN and Ba = R2N where A is an n x 3 
matrix of surface normals and Ri is a,n n x n diagonal matrix whose diagonal 
elements are positive and represent the albedo. The following proposition relates 
the illumination cones of the two objects. 

Proposition 6. If C\ is the illumination cone for an object defined by B\ = 
R.\N and C2 is the illumination cone for an object defined by B2 = R2N , then 

Cl = {RiRf^x : X G C2] and 

C2 = {R2Rf^x : X e Cl}. 

Proof. For every light source direction s, the corresponding images are given by 
xi = max(i?is, 0) = i?imax(As,0) and X2 = max(i?2S, 0) = i?2 niax(As, 0). 
Since Ri and R2 are diagonal with positive diagonal elements, they are invertible. 
Therefore, xi = RiRf^X2 and X2 = R2Rf^xi. 

Thus, the cones for two objects with identical geometry but differing albedo 
patterns differ by a diagonal linear transformation. This fact can be applied 
when computing cones for objects observed by color (multi-band) cameras as 
noted in Section 4. Note that this proposition also holds when the objects are 
non-convex; since the partitioning of the illumination sphere is determined by 
the objects’ surface geometry, the set of shadowing configurations is identical for 
two objects with the same shape. The intensities of the illuminated pixels are 
related by the transformations given in the proposition. 




Representations for Recognition Under Variable Illumination 



115 



3.3 Shape of the Illumination Cone 

While we have shown that an illumination cone is a convex, polyhedral cone 
that can span n dimensions if there are n distinct surface normals, we have 
not said how big it is in practice. Note that having a cone span n dimensions 
does not mean that it covers IR", since a convex cone is defined only by convex 
combinations of its extreme rays. It is conceivable that an illumination cone 
could completely cover the positive orthant of IR". However, the existence of an 
object geometry that would produce this is unlikely. For such an object, it must 
be possible to choose n light source direction such that each of the n facets are 
illuminated independently. 

On the other hand, if the illumination cones for objects are small and well 
separated, then recognition should be possible, even under extreme lighting con- 
ditions. We believe that the latter is true - that the cone has almost no volume 
in the image space. We offer the following conjecture: 

Conjecture 1. The shape of the cone is “fiat,” i.e., most of its volume is concen- 
trated near a low-dimensional subspace. 

While we have yet to prove this conjecture, the empirical investigations of [9, 15] 
and the one in the following section seem to support it. 

3.4 Empirical Investigation of the Shape of the Illumination Cones 

To investigate Proposition 5 and Conjecture 1, we have gathered several images, 
taken under varying lighting conditions, of two objects: the corner of a cardboard 
box and a Wilson tennis ball. For both objects, we eomputed the illumination 
subspace using SVD their corresponding sets of images. Using the estimated 
illumination subspaces, we performed two experiments. 

In the first experiment, we tried to confirm that the illumination spheres for 
both objects would appear as we would expect. For both the box and the tennis 
ball, we drew the great circles associated with each pixel on the illumination 
sphere, see Fig. 7. From Proposition 5, we would expect the illumination cone 
produced by the corner of the box to be three dimensional since the corner 
has only three faces. The illumination sphere should be partitioned into eight 
regions by three great circles, each meeting the other two orthogonally. This 
structure is evident in the figure. Yet, due to both image noise and the fact that 
the surface is not truly Lambertian, there is some small deviation of the great 
circles. Furthermore, the few pixels from the edge and corner of the box produce 
a few stray great circles. In contrast, the visible surface normals of the tennis 
ball should nearly cover half of the Gauss sphere and, therefore, the great circles 
should nearly cover the illumination sphere. Again, this structure is evident in 
the figure. 

In the seeond experiment, we plotted the eigenvalues of the matrix of extreme 
rays for both the box and the tennis ball. The point of this experiment was to 
compare the size and “flatness” of both cones. As discussed in Section 2.1, an 
extreme ray is an image created by a light source direction s^- lying at the 




116 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 




Fig. 7. Examples of Illumination Spheres: On the left, the figure shows an 
image of the corner of a cardboard box and its corresponding illumination sphere. 
Note that the illumination sphere is, for the most part, partitioned into eight 
regions by three great circles, each meeting the other two orthogonally. On the 
right, the figure shows an image of a Wilson tennis ball and its corresponding 
illumination sphere. Note that the great circles nearly cover the illumination 
sphere. 



intersection of two or more great circles on the illumination sphere. The matrix 
of extreme rays is simply the matrix whose columns are the vectorized images 
Xy/|xij|. We then performed SVD on the matrix of extreme rays for the box 
corner and the matrix of extreme rays for the tennis ball. The corresponding 
eigenvalues are plotted in decreasing order in Fig. 8. 

From this figure we make the following observations. First, in the plot of the 
box corner there is a sharp drop-off after the third eigenvalue, indicating that 
most of the illumination cone is concentrated near a 3-D subspace of the image 
space. Second, the eigenvalues for the tennis ball do not drop-off as quickly 
as those for the box, indicating that the illumination cone for the tennis ball 
is larger than that for the box. And, third, the eigenvalues for both the box 
corner and the tennis ball diminish by at least two orders of magnitude within 
the first fifteen eigenvalues. Thus, in agreement with the above conjecture, the 
illumination cones appear to be concentrated near a low dimensional subspace. 




Representations for Recognition Under Variable Illumination 117 



Plot of Box Eigenvalues for Matrix of Extreme Rays Plot of Ball Eigenvalues for Matrix of Extreme Rays 




Eigenvalue Eigenvalue 



Fig. 8. Eigenvalues for the Matrix of Extreme Rays: The figure shows a 
plot in decreasing order of the eigenvalues of the matrix of extreme rays for the 
illumination cone of the corner of a box and for the illumination cone of a tennis 
ball. 



We should point out that Epstein et al. [9] and Hallinan [15] performed a 
related experiment on images created by physically moving the light source to 
evenly sample the illumination sphere. They, too, found that the set of images 
of an object under variable illumination lies near a low dimensional subspace. 
Our results using synthesized images from the cone seem to complement their 
findings. 

4 Color 

Until now, we have neglected the spectral distribution of the light sources, the 
color of the surface, and the spectral response of the camera; here we extend the 
results of Section 2 to multi-spectral images. 

Let A denote the wavelength of light. Let pi{X) denote the response for all 
elements of the ith color channel. Let R{X) be a diagonal matrix whose elements 
are the spectral reflectance functions of the facets, and let the rows of G 
IR”^^ be the surface normals of the facets. Finally, let s(A) and s be the power 
spectrum and direction of the light source, respectively. Then, ignoring attached 
shadows and the associated max operation, the n-pixel image x^ produced by 
color channel i of a convex Lambertian surface from a single colored light is [18, 
21 ] 

Xi = y pi{X){R{X)N){s{X)s)dX. (5) 

It is difficult to make limiting statements about the set of possible images 
of a colored object when p(A), R{X) and 5(A) are arbitrary. For example, if 
we consider a particular object with a spectral reflectance function R{X) and 
surface normals N, then without constraining assumptions on pi{X) and 5(A), 





118 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



any image Xi is obtainable. Consequently, we will consider two specific cases: 
cameras with narrow-band spectral response and light sources with identical 
spectral distributions. 



4.1 Narrow-Band Cameras 

Following [29], if the sensing elements in each color channel have narrow-band 
spectral response or can be made to appear narrow band [11], then pi{\) can be 
approximated by a Dirac delta function about some wavelength A^, and Eq. 5 
can be rewritten as 

Xi = p{\i){R{\i)N){~s{\i)s) 

= Pi(i?,iV)(sis). 

Note that pi , Ri and N are constants for a given surface and camera whereas 
Si and s depend on properties of the light source. Eq. 6 can be expressed using 
the notation of Eq. 1 where B = piRiN and s = sS. The diagonal elements of 
PiRi are the effective albedo of the facets for color channel i. For c narrow-band 
color channels, the color image x = [x| | X 2 | • ■ • ] x*]* formed by stacking up the c 
images for each channel can be considered as a point in IR'^". Under a single light 
source, x is a function of s and si ■ • • Sc- Taken over all light source directions 
and spectral distributions, the set of images from a single light source without 
shadowing is a c -|- 2 dimensional manifold in It is easy to show that this 
manifold is embedded in a 3c-dimensional linear subspace of IR'^", and that any 
point (image) in the intersection of this linear subspace with the positive orthant 
of ]R‘^" can be achieved by three colored light sources. 

A basis for this 3c-dimensional subspace can be constructed from three color 
images without shadowing. This is equivalent to independently constructing c 
three-dimensional linear subspaces in M", one for each color channel. Note that 
PiRiN spans subspace i. When attached shadows are considered, an illumination 
cone can be constructed in IR" for each color channel independently. The cones 
for each color channel are closely related since they arise from the same surface; 
effectively the albedo matrix Ri may be different for each color channel, but 
the surface normals N are the same. As demonstrated in Section 3.2, the cones 
for two surfaces with the same geometry, but different albedo patterns differ by 
a diagonal linear transformation. Now, the set of all multi-spectral images of 
a convex Lambertian surface is a convex polyhedral cone in IR'^"' given by the 
Cartesian product of the c individual cones. Following Proposition 5, this color 
cone spans at most cm dimensions where m is the number of distinct surface 
normals. 



4.2 Light Sources with Identical Spectra 

Consider another imaging situation in which a color camera (c channels, not 
necessarily narrow-band) observes a scene where the number and location of 
the light sources are unknown, but the power spectral distributions of all light 




Representations for Recognition Under Variable Illumination 



119 



sources are identical (e.g. , incandescent bulbs) . Equation 5 can then be rewritten 
as 

X* = (^y p^{X)5{X)R{X)dX^ m. (7) 

In this case, the integral is independent of the light source direction and scales 
with its intensity. If we define the intensity of the light source to be s = / s{X)dX, 
then s = Ss and = I f pi(X)S(X)R(X)dX. Equation 7 can then be expressed 
as 

Xi = RiNs. 

For c color channels, the color image x e IR'^” is given by 

X = [Ri I i ?2 I ■ ■ • I Ref Ns. 

Consequently, the set of images of the surface without shadowing is a three- 
dimensional linear subspace of IR'^” since Ri and N are constants. Following 
Section 2, the set of all images with shadowing is a convex polyhedral cone 
that spans m dimensions of IR'^". Thus, when the light sources have identical 
power spectra (even if the camera is not narrow-band), the set of all images 
is significantly smaller than considered above since the color measured at each 
pixel is independent of the light source direction. 

5 Face Recognition Using the Illumination Cone 

Until this point, we have focused on properties of the set of images of an object 
under varying illumination. Here, we utilize these properties to develop repre- 
sentations and algorithms for recognizing objects, namely faces, under differ- 
ent lighting conditions. Face recognition is a challenging yet well-studied prob- 
lem [8, 32]; the difficulty in face recognition arises from the fact that many faces 
are geometrically and photometrically very similar, yet there is a great deal of 
variability in the images of an individual due to changes of pose, lighting, facial 
expression, facial hair, hair style, makeup, age, etc. Here we focus solely on il- 
lumination, and in this section, we empirically compare these new methods to 
a number of popular techniques such as correlation [6] and Eigenfaces [24, 39] 
as well as more recently developed techniques such as distance to linear sub- 
space [3, 15, 29, 34]. 

5.1 Constructing the Illumination Cone Representation of Faces 

In the experiments reported below, illumination cones are constructed using vari- 
ations of the illumination subspace method and the cast shadow method. When 
implementing these methods, there are two problems which must be addressed. 

The first problem that arises with these two methods is with the estimation 
of B* . For even a convex object whose Gaussian image covers the Gauss sphere, 
there is only one light source direction - the viewing direction - for which no 
point on the surface is in shadow. For any other light source direction, shadows 




120 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



Subset 1 Subset 2 Subset 3 Subset 4 Subset 5 




Fig. 9. Example images from each subset of the Harvard Database used to test 
the algorithms. 



will be present. For faces, which are not convex, shadowing in the modeling im- 
ages is likely to be more pronounced. When SVD is used to estimate B* from 
images with shadows, these systematic errors can bias the estimation signifi- 
cantly. Therefore, alternative ways are needed to estimate B* that take into 
account the fact that some data values should not be used in the estimation. 

The next problem is that usually m, the number of independent normals in 
B, can be large (more than a thousand) hence the number of extreme rays needed 
to completely define the illumination cone can run in the millions. Therefore, we 
must approximate the cone in some fashion; in this work, we choose to use a 
small number of extreme rays (images). In Section 3.4 it was shown empirically 
that the cone is flat (i.e., elements lie near a low dimensional linear subspace), 
and so the hope is that a sub-sampled cone will provide an approximation that 
leads to good recognition performance. In our experience, around 60-80 images 
are sufficient, provided that the corresponding light source directions s^- more or 
less uniformly sample the illumination sphere. The resulting cone C* is a subset 
of the object’s true cone C. In the Sampling Method described in Section 2.2, 
an alternative approximation to C is obtained by directly sampling the space of 
light source directions rather than generating the extreme rays through Eq. 4. 
While the resulting images form the extreme rays of the representation C* and 



Representations for Recognition Under Variable Illumination 



121 



lie on the boundary of the true cone C, they are not necessarily extreme rays of 

C. 



Estimating B* Using singular value decomposition directly on the images 
leads to a biased estimate of B* due to shadows. In addition, portions of some of 
the images from the Harvard database used in our experiments were saturated. 
Both shadows formed under a single light source and saturations can be detected 
by thresholding and labeled as “missing” - these pixels do not satisfy the linear 
equation x = Bs. Thus, we need to estimate the 3-D linear subspace B* from 
images with missing values. 

Define the data matrix for c images of an individual to be = [xi . . . Xc] . 
If there were no shadowing, X would be rank 3, and we could use SVD to 
decompose X into X = B*S* where S* is the 3 x c matrix of the light source 
direction for all c images. To estimate a basis B* for the 3-D linear subspace 
C from image data with missing elements, we have implemented a variation of 
[35]; see also [37, 20]. 

The overview of this method is as follows: without doing any row or column 
permutations sift out all the full rows (with no invalid data) of matrix X to form 
a full sub-matrix X. Perform SVD on X and get an initial estimate of S*. Fix 
S* and estimate each of the rows of B* independently using least squares. Then, 
fix B* and estimate each of the light source direction s* independently. Repeat 
last two steps until estimates converge. The inner workings of the algorithm are 
given as follows: Let be the ith row of B*, let x^ be the zth row of X. Let p 
be the indices of non- missing elements in x^, and let x^ be the row obtained by 
taking only the non- missing elements of x^, and let similarly be the submatrix 
of S* consisting of rows with indices in p. Then, the ith row of B* is given by, 

b, = (xf)(5^)t 

where (5^)1^ is the pseudo-inverse of S^. With the new estimate of B* at hand, 
let Xj be the jth column of X, let p be the indices of non-missing elements in 
Xj , and let Xj be the column obtained by taking only the non-missing elements 
of Xj. Let BP similarly be the submatrix of B* consisting of rows with indices 
in p. Then, the jth light source direction is given by, 

s, = (R^)t(x^^) 

After the new set of light sources S* has been calculated, the last two steps 
can be repeated until the estimate of B* converges. The algorithm is very well 
behaved, converging to the global minimum within 10-15 iterations. Though it 
is possible to converge to a local minimum, we never observed this in simulation 
or in practice. 



Enforcing Integrability To predict cast shadows, we must reconstruct a sur- 
face and to do this, the vector field B* must correspond to an integrable normal 
field. Since no method has been developed to enforce integrability during the 




122 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 





b. 



Fig. 10. Figure 4 showed six original images of a face and three images spanning 
the linear subspace B*. a) From B* , the surface is reconstructed up to a GBR 
transformation, b) Sample images from database (left column); closest image in 
illumination cone without cast shadows (middle column); and closest image in 
illumination cone with cast shadows (right column) 



estimation of B*, we enforce it afterwards. That is, given B* computed as de- 
scribed above, we estimate a matrix A e GL{3) such that B*A corresponds to 
an integrable normal field; the development follows [42]. 

Consider a continuous surface defined as the graph of z{x, y), and let b be the 
corresponding normal field scaled by an albedo (scalar) field. The integrability 
constraint for a surface is Zxy = Zyx where subscripts denote partial derivatives. 
In turn, b must satisfy: 




To estimate A such that b^(x, y) = (x,y)A, we expand this out. Letting 

the columns of A be denoted by Ai,A 2 ,A^ yields 




Representations for Recognition Under Variable Illumination 



123 



ih*" As)ihf A 2 ) - (b*"A2)(bf ylg) = 

(b*^A3)(bf Ai) - (b*^Ai)(bf A3) 
which can be expressed as 

b*^5\b* = b*^52b; (8) 

where S*! = AsAf^ — A2A3" and S 2 = A3A^ — AiAg". 

Si and S 2 are skew-symmetric matrices and have three degrees of freedom. 
Equation 8 is linear in the six elements of Si and S 2 - From the estimate of B* 
obtained using the method in Section 5.1, discrete approximations of the partial 
derivatives (b* and b*) are computed, and then SVD is used to solve for the 
six elements of and S'2. In [42], it was shown that the elements of Si and 
S 2 are cofactors of A, and a simple method for computing A from the cofactors 
was presented. This procedure only determines six degrees of freedom of A. The 
other three correspond to the generalized bas relief (GBR) transformation [2] 
and can be chosen arbitrarily since GBR preserves integrability. The surface 
corresponding to B*A differs from the true surface by GBR, i.e., z*{x,y) = 
\z{x, y) + i^ix + vy for arbitrary A, iz with A ^ 0. 



Generating a GBR Snrface The preceding sections give a method for es- 
timating the matrix B* and then enforcing integrability; we now reconstruct 
the corresponding surface z{x,y). Note that z{x,y) is not a Euclidean recon- 
struction of the face, but a representative element of the orbit under a GBR 
transformation. Recall that both shading and shadowing will be correct for im- 
ages synthesized from a transformed surface. 

To find z{x,y), we use the variational approach presented in [19]. A surface 
z{x,y) is fit to the given components of the gradient P = and q = 

^ by minimizing the functional 






p)^ + {Zy - q)^ dxdy. 



whose Euler equation reduces to z = px + qy By enforcing the right natu- 
ral boundary conditions and employing an iterative scheme that uses a discrete 
approximation of the Laplacian, we can generate the surface z{x,y) [19]. Then, 
it is a simple matter to construct an illumination cone representation that in- 
corporates cast shadows. Using ray-tracing techniques for a given light source 
direction, we can determine the cast shadow regions and correct the extreme 
rays of C*. 

Figures 4 and 10 demonstrate the process of constructing the cone C* . Figure 
4 shows the training images for one individual in the databaseas as well as the 
columns of the matrix B*. Figure 10. a shows the reconstruction of the surface 
up to a GBR transformation. The left column of Fig. 10. b shows sample images 
in the database; the middle column shows the closest image in the illumination 
cone without cast shadows; and the right column shows the closest image in the 
illumination cone with cast shadows. Note that the background and hair have 
been masked. 




124 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



5.2 Recognition 

The cone C* can be used in a natural way for face recognition, and in experi- 
ments described below, we compare three recognition algorithms to the proposed 
method. From a set of face images labeled with the person’s identity ( the learn- 
ing set) and an unlabeled set of face images from the same group of people ( the 
test set), each algorithm is used to identify the person in the test images. For 
more details of the comparison algorithms, see [3]. We assume that the face has 
been located and aligned within the image. 

The simplest recognition scheme is a nearest neighbor classifier in the image 
space [6] . An image in the test set is recognized (classified) by assigning to it the 
label of the closest point in the learning set, where distances are measured in 
the image space. If all of the images are normalized to have zero mean and unit 
variance, this procedure is equivalent to choosing the image in the learning set 
that best eorrelates with the test image. Because of the normalization process, 
the result is independent of light source intensity. 

As correlation methods are computationally expensive and require great 
amounts of storage, it is natural to pursue dimensionality reduction schemes. 
A technique now commonly used in computer vision - particularly in face recog- 
nition - is principal components analysis (PCA) which is popularly known as 
Eigenfaees [15, 27, 24, 39]. Given a collection of training images e IR", a 
linear projection of each image = Wxi to an /-dimensional feature space is 
performed. A face in a test image x is recognized by projecting x into the feature 
space, and nearest neighbor classification is performed in IR-^. The projection ma- 
trix W is chosen to maximize the scatter of all projected samples. It has been 
shown that when / equals the number of training images, the Eigenface and Cor- 
relation methods are equivalent (See [3, 27]). One proposed method for handling 
illumination variation in PCA is to discard from W the three most significant 
principal components; in practice, this yields better recognition performance [3]. 

A third approach is to model the illumination variation of each face as a 
three-dimensional linear subspace £ as described in Section 2.1. To perform 
recognition, we simply compute the distance of the test image to each linear 
subspace and choose the face corresponding to the shortest distance. We call 
this recognition scheme the Linear Subspace method [2]; it is a variant of the 
photometric alignment method proposed in [34] and is related to [16, 29]. While 
this models the variation in intensity when the surface is completely illuminated, 
it does not model shadowing. 

Finally, given a test image x, recognition using illumination cones is per- 
formed by first computing the distance of the test image to each cone, and then 
choosing the face that corresponds to the shortest distance. Since each cone is 
convex, the distance can be found by solving a convex optimization problem. In 
particular, the non-negative linear least squares technique contained in Matlab 
was used in our implementation, and this algorithm has computational complex- 
ity 0{n e^) where n is the number of pixels and e is the number of extreme rays. 
Two different vatiations for constructing the cone and a method for increasing 
the speed are considered. 




Representations for Recognition Under Variable Illumination 



125 




Subset 1 
Subset 2 
Subset 3 
Subset 4 
Subset 5 



Fig. 11. The highlighted lines of longitude and latitude indicate the light source 
directions for Subsets 1 through 5. Each intersection of a longitudinal and latitu- 
dinal line on the right side of the illustration sphere has a corresponding image 
in the database. 



5.3 Experiments and Results 

To test the effectiveness of these recognition algorithms, we performed a series 
of experiments on a database from the Harvard Robotics Laboratory in which 
lighting had been systematically varied [15, 16]. In each image in this database, 
a subject held his/her head steady while being illuminated by a dominant light 
source. The space of light source directions, which can be parameterized by 
spherical angles, was then sampled in 15° increments. See Figure 11. From this 
database, we used 660 images of 10 people (66 of each). We extracted five subsets 
to quantify the effects of varying lighting. Sample images from each subset are 
shown in Fig. 9. Subset 1 (respectively 2, 3, 4, 5) contains 60 (respectively 90, 
130, 170, 210) images for which both the longitudinal and latitudinal angles 
of light source direction are within 15° (respectively 30°, 45°, 60°, 75°) of the 
camera axis. 

All of the images were cropped (96 by 84 pixels) within the face so that 
the contour of the head was excluded. For the Eigenface and correlation tests, 
the images were normalized to have zero mean and unit variance, as this im- 
proved the performance of these methods. For the Eigenface method, we used 
twenty principal components - recall that performance approaches correlation 
as the dimension of the feature space is increased [3, 27]. Since the first three 
principal components are primarily due to lighting variation and since recogni- 
tion rates can be improved by eliminating them, error rates are also presented 
when principal components four through twenty-three are used. For the cone ex- 
periments, we tested two variations: in the first variation (Cones-attached), the 
representation was constructed ignoring cast shadows by essentially using the 
illumination subspace method except that B* is estimated using the technique 
described in Section 5.1. In the second variation (Cones-cast), the representation 



126 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



was constructed using the cast shadow method as described in Section 5.1. In 
both variations, recognition was performed by choosing the face corresponding 
to the smallest computed distance to cone. 

In our quest to speed up the recognition process using cones, we also employed 
principal components analysis (PC A). The collection of all images in the cones 
(with cast shadows) is projected down to a 100-dimensional feature space. This 
is achieved by performing a linear projection of the form = ITxi, where the 
projection matrix W is chosen to maximize the scatter of all projected samples. A 
face in an image, normalized to have zero mean and unit variance, is recognized 
by first projecting the image down to this 100-dimensional feature space and 
then performing nearest neighbor classification. 

Mirroring the extrapolation experiment described in [3], each method was 
trained on samples from Subset 1 (near frontal illumination) and then tested 
using samples from Subsets 2, 3, 4 and 5. (Note that when tested on Subset 
1, all methods performed without error). Figure 12 shows the result from this 
experiment. 



5.4 Discussion of Face Recognition Results 

From the results of this experiment, we draw the following conclusions: 

— The illumination cone representation outperforms all of the other techniques. 

— When cast shadows are included in the illumination cone, error rates are 
improved. 

— PGA of cones with cast shadows outperforms all of the other methods except 
distance to cones with cast shadows. The small degradation in error rates is 
offset by the considerable speed up of more than one order of magnitude. 

— For very extreme illumination (Subset 5), the Correlation and Eigenface 
methods completely break down, and exhibit results that are slightly bet- 
ter than chance (90% error rate). The cone method performs significantly 
better, but certainly not well enough to be usable in practice. At this point, 
more experimentation is required to determine if recognition rates can be 
improved by either using more sampled extreme rays or by improving the 
image formation model. 

6 Conclusions and Discussion 

In this chapter we have shown that the set of images of a convex object with a 
Lambertian reflectance function, under all possible lighting conditions at infinity, 
is a convex, polyhedral cone. Furthermore, we have shown that this cone can be 
learned from three properly chosen images and that the dimension of the cone 
equals the number of distinct surface normals. We have shown that for objects 
with an arbitrary reflectance function and a non-convex shape, the set of images 
is still a convex cone and that these results can be easily extended to color 
images. For non-convex Lambertian surfaces, three images is still sufficient for 




Representations for Recognition Under Variable Illumination 127 




Extrapolating from Subset 1 


Method 


Error Rate (%) 


Subset 2 
30° 


Subset 3 
45° 


Subset 4 
60° 


Subset 5 
75° 


Correlation 


2.2 


46.2 


74.7 


86.6 


Eigenface 


3.3 


48.5 


76.5 


86.6 


Eigenface 
w/o 1st 3 


0.0 


32.3 


60.0 


80.6 


Linear subspace 


0.0 


3.9 


22.4 


50.8 


Cones- attached 


0.0 


2.3 


17.1 


43.8 


Cones-cast (PCA) 


0.0 


1.5 


13.5 


39.8 


Cones-cast 


0.0 


0.0 


10.0 


37.3 



Fig. 12. Extrapolation: When each of the methods is trained on images with 
near frontal illumination (Subset 1), the graph and corresponding table show 
the relative performance under more extreme light source conditions. 



constructing the cone, and this is accomplished by first reconstructing the surface 
up to a shadow-preserving generalized bas relief transformation. We have applied 
these results to develop a face recognition technique based on computing distance 
to cone, and have demonstrated that the method is superior to methods which do 
not model illumination effects, particularly the role of shadowing. Nevertheless, 
there remain a number of extensions and open issues which we discuss below. 





128 



David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 



6.1 Interreflection 

A surface is not just illuminated by the light sources but also through inter- 
reflections from points on the surface itself [1, 13]. For a Lambertian surface, 
the image with interreflection x' is related to the image that would be formed 
without interreflection x by 



x' = (I -RK)-^y. 

where I is the identity matrix, i? is a diagonal matrix whose diagonal elements 
denote the albedo of facet i, and K is known as the interreflection kernel [28]. 
When there is no shadowing, all images lie in a 3-D linear space that would 
be generated from Eq. 1 by a pseudo-surface whose normals and albedo B' are 
given by B' ~ {I — RK)^^B [28, 29]. From Proposition 4, the set of all possible 
images is still a cone. While B' can be learned from only three images, the set 
of shadowing configurations and the partitioning of the illumination sphere is 
generated from B, not B' . So, it remains an open question how the cone can be 
constructed from only three images. 



6.2 Effects of Change in Pose 

All of the previous analysis in the chapter has dealt solely with variation in 
illumination. Yet, a change in the object’s pose creates a change in the per- 
ceived image. If an object undergoes a rotation or translation, how does the 
illumination cone deform? The illumination cone of the object in the new pose 
is also convex, but almost certainly different from the illumination cone of the 
object in the old pose. Which raises the question: Is there a simple transforma- 
tion, obtainable from a small number of images of the object seen from different 
views, which when applied to the illumination cone characterizes these changes? 
Alternatively, is it practical to simply sample the pose space constructing an 
illumination cone for each pose? Nayar and Murase have extended their appear- 
ance manifold representation to model illumination variation for each pose as a 
3-D linear subspace [29]. However, their representation does not account for the 
complications produced by attached shadows. 



6.3 Object Recognition 

It is important to stress that the illumination cones are convex. If they are non- 
intersecting, then the cones are linearly separable. That is, they can be separated 
by n — 1 dimensional hyperplanes in M” passing through the origin. Furthermore 
since convex sets remain convex under linear projection, then for any projection 
direction lying in a separating hyperplane, the projected convex sets will also 
be linearly separable. For d different objects represented by d linearly separable 
convex cones, there always exists a linear projection of the image space to a d—1 
dimensional space such that all of the projected sets are again linearly separable. 
So, an alternative to classification based on measuring distance to the cones in 




Representations for Recognition Under Variable Illumination 



129 



M” is to find a much lower dimensional space in which to do classification. In our 
Fisherface method for recognizing faces under variable illumination and facial 
expression, projection directions were chosen to maximize separability of the 
object classes [3]; a similar approach can be taken here. 

The face recognition experiment was limited to the available dataset from 
the Harvard Robotics Laboratory. To perform more extensive experimentation, 
we are constructing a geodesic lighting rig that supports 64 computer controlled 
xenon strobes. Using this rig, we will be able to modify the illumination at frame 
rates and gather an extensive image database covering a broader range of lighting 
conditions including multiple sources. Note that the images in the Harvard face 
database were obtained with a single source, and so all of the images in the 
test set were on or near the boundary of the cone. Images formed with multiple 
light sources may lie in the interior, and we have not tested these methods with 
multiple light sources. Our new database will permit such experimentation. 



Acknowledgments 

The authors would like to thank David Mumford and Alan Yuille for their many 
comments, David Forsyth for his insights on interreflections, and David Jacobs, 
Michael Langer, Joao Hespanha and Elena Dotsenko for many relevant discus- 
sions. The authors would also like to thank Peter Hallinan for providing images 
from the Harvard Face Database. 

References 

[1] R. Bajcsy, S. Lee, and A. Leonardis. Detection of diffuse and specular interface 
reflections and inter-reflections by color image segmentation. Int. J. Computer 
Vision, 17(3):241-272, March 1996. 

[2] P. Belhumeur, D. Kriegman, and A. Yuille. The bas-relief ambiguity. In Proc. 
IEEE Conf. on Comp. Vision and Patt. Recog., pages 1040-1046, 1997. 

[3] P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman. Eigenfaces vs. Fisherfaces: 
Recognition using class specific linear projection. IEEE Trans. Pattern Anal 
Mach. Intelligence, 19(7):711-720, 1997. Special Issue on Face Recognition. 

[4] P. N. Belhumeur and D. J. Kriegman. What is the set of images of an object 
under all possible lighting conditions. In Proc. IEEE Conf. on Comp. Vision and 
Patt. Recog., pages 270-277, 1996. 

[5] T. Binford. Generic surface interpretation: Observability model. In Proc. of the 
4*^ International Symposium on Robotics Research, Santa Cruz, CA, August 1987. 

[6] R. Brunelli and T. Poggio. Face recognition: Features vs templates. IEEE Trans. 
Pattern Anal. Mach. Intelligence, 15(10):1042-1053, 1993. 

[7] M. Canon, C. Cullum Jr., and E. Polak. Theory of Optimal Control and Mathe- 
matical Programming. McGraw-Hill, New York, 1970. 

[8] R. Chellappa, C. Wilson, and S. Sirohey. Human and machine recognition of 
faces: A survey. Proceedings of the IEEE, 83(5):705-740, 1995. 

[9] R. Epstein, P. Hallinan, and A. Yuille. 5 ± 2 Eigenimages suffice: An empirical 
investigation of low-dimensional lighting models. Technical Report 94-11, Harvard 
University, 1994. 




130 David J. Kriegman, Peter N. Belhumeur, and Athinodoros S. Georghiades 

[10] R. Epstein, A. Yuille, and P. N. Belhumeur. Learning object representations from 
lighting variations. In Proc. of the Int. Workshop on Object Representation for 
Computer Vision, page 179, 1996. 

[11] G. Finlayson, M. Drew, and B. Funt. Spectral sharpening: Sensor transformations 
for improved color constancy. J. Opt. Soc. Am. A, 11:1553-1563, 1994. 

[12] D. Forsyth and M. Fleck. Body plans. In Proc. IEEE Conf. Computer Vision 
and Pattern Recognition, 1997. 

[13] D. Forsyth and A. Zisserman. Reflections on shading. IEEE Trans. Pattern Anal. 
Mach. Intelligence, 13(7):671-679, 1991. 

[14] A. Georghiades, D. Kriegman, and P. Belhumeur. Illumination cones for recog- 
nition under variable lighting: Faces. In Proc. IEEE Conf. on Comp. Vision and 
Patt. Recog., pages 52-59, 1998. 

[15] P. Hallinan. A low-dimensional representation of human faces for arbitrary light- 
ing conditions. In Proc. IEEE Conf. on Comp. Vision and Patt. Recog., pages 
995-999, 1994. 

[16] P. Hallinan. A Deformable Model for Pace Recognition Under Arbitrary Lighting 
Conditions. PhD thesis. Harvard University, 1995. 

[17] H. Hayakawa. Photometric stereo under a light-source with arbitrary motion. 
JOSA-A, ll(ll):3079-3089, Nov. 1994. 

[18] B. Horn. Computer Vision. MIT Press, Cambridge, Mass., 1986. 

[19] B. Horn and M. Brooks. The variational approach to shape from shading. Com- 
puter Vision, Graphics and Image Processing, 35:174-208, 1992. 

[20] D. Jacobs. Linear fitting with missing data: Applications to structure from motion 
and characterizing intensity images. In Proc. IEEE Conf. on Comp. Vision and 
Patt. Recog., pages 206-212, 1997. 

[21] G. Klinker, S. Shafer, and T. Kanade. Image segmentation and reflection analysis 
through color. Int. J. Computer Vision, 2(l):7-32, June 1988. 

[22] J. Koenderink and A. van Doom. Bidirectional reflection distribution function 
expressed in terms of surface scattering modes. In Proc. European Conf. on Com- 
puter Vision, pages 11:28-39, 1996. 

[23] D. Kriegman and P. Belhumeur. What shadows reveal about object structure. In 
Proc. European Conf. on Computer Vision, pages 399-414, 1998. 

[24] L. Sirovitch and M. Kirby. Low-dimensional procedure for the characterization 
of human faces. J. Optical Soc. of America A, 2:519-524, 1987. 

[25] M. Langer and S. Zucker. A ray-based computational model of light sources and 
illumination. In Physics Based Modeling Workshop in Computer Vision, 1995. 

[26] Y. Moses, Y. Adini, and S. Ullman. Face recognition: The problem of compensat- 
ing for changes in illumination direction. In Proc. European Conf. on Computer 
Vision, pages 286-296, 1994. 

[27] H. Murase and S. Nayar. Visual learning and recognition of 3-D objects from 
appearence. Int. J. Computer Vision, 14(l):5-24, 1995. 

[28] S. Nayar, K. Ikeuchi, and T. Kanade. Shape from interreflections. IJCV, 6(3):173- 
195, August 1991. 

[29] S. Nayar and H. Murase. Dimensionality of illumination in appearance matching. 
IEEE Conf. on Robotics and Automation, 1996. 

[30] M. Oren and S. Nayar. Generalization of the Lambertian model and implications 
for machine vision. Int. J. Computer Vision, 14:227-251, 1996. 

[31] R. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970. 

[32] A. Samal and P. Iyengar. Automatic recognition and analysis of human faces and 
facial expressions: A survey. Pattern Recognition, 25:65-77, 1992. 




Representations for Recognition Under Variable Illumination 



131 



[33] A. Shasliua. Geometry and Photometry in 3D Visual Recognition. PhD thesis, 
MIT, 1992. 

[34] A. Shashua. On photometric issues to feature-based object recognition. Int. J. 
Computer Vision, 21:99-122, 1997. 

[35] H. Shum, K. Ikeuchi, and R. Reddy. Principal component analysis with missing 
data and its application to polyhedral object modeling. PAMI, 17(9):854-867, 
September 1995. 

[36] H. Tagare and R. deFigueiredo. A framework for the construction of reflectance 
maps for machine vision. Comp. Vision, Graphics, and Image Proces., 57{3):2G5- 
282, May 1993. 

[37] C. Tomasi and T. Kanade. Shape and motion from image streams under or- 
thography: a factorization method. International Journal of Computer Vision, 
9(2):134-154, 1992. 

[38] K. Torrance and E. Sparrow. Theory for off-specular reflection from roughened 
surfaces. JOSA, 57:1105-1114, 1967. 

[39] M. Turk and A. Pentland. Eigenfaces for recognition. J. of Cognitive Neuroscience, 
3(1), 1991. 

[40] S. Ullman and R. Basri. Recognition by a linear combination of models. A. I. 
Memo 1152, MIT, Aug. 1989. 

[41] N. Watts. Calculating the principal views of a polyhedron. Technical Report CS 
Tech. Report 234, Rochester University, 1987. 

[42] A. Yuille and D. Snow. Shape and albedo from multiple images using integrability. 
In Proc. IEEE Conf. on Comp. Vision and Patt. Recog. , pages 158-164, 1997. 




Shadows, Shading, and Projective Ambignity 



eter N elhumeur^ vi 



Kriegm n 1 n L uille^ 



nt r or omput on 1 on n 
km n n t tut n v r ty o 
® m th-K ttl w 11 y 



ontrol 1 nvrtyNw vn 06520 
11 no r n - h mp gn r n L 61 01 
r h n t tut n r n o 9 115 



Abstract, n n o rv rom fix v wpo nt th to h ow 

urv n n m g h ng po nt 1 ght our (n r y or t nfin ty) 

um r nt lo t on . how th t or ny fin t to po nt 1 ght 

our Hum ntngnoj tvw un r thr orthogr ph or p r- 

p tv proj tonthr n uvln 1 oojthp hvng 

th m t o h ow . mroth uvln 1 ry 

our prmtr mlyo proj tv tr n orm t on n th h ow 
o tr n orm o j t r nt 1 wh n th m tr n orm t on 

ppl to th 1 ght our lo t on . n r orthogr ph proj t on th 
m ly th g n r 1 z -r 1 ( ) tr n orm t on n w how 

th t th tr n orm t on th only m ly o tr n orm t on o n 

o j t h p or wh h th ompl t t o m g h ow nt 1. 

urth rmor or o j t w th L m rt n nr Hum n t y 

t nt 1 ght our th uvln 1 oojthp wh h pr rv 

h ow 1 o pr rv nr h ng. n lly w how th t g v n mul- 
t pi mg un r r ng n unknown 1 ght our r t on t 
po 1 to r on tru t n o j t h p up to th tr n orm t on 
rom th h ow Ion . 



1 Introduction 

n his fi teenth entury Treatise on Painting 1 Leon r o in i errs in his 
n lysis o sh ows while omp ring p inting n relie s ulpture 

s r s light n sh e re on erne low relie ils oth s s ulpture 
n s p inting e use the sh ows orrespon to the low n ture o the 
relie s or ex mple in the sh ows o oreshortene o je ts whi h will 
not exhi it the epth o those in p inting or in s ulpture in the roun 

t is true th t when illumin te y the s me light sour e relie sur e 
n sur e in the roun ” will st ifferent sh ows owever Leon r o’s 
st tement ppe rs to overlook the t th t or ny 11 ttening o the sur e 
relie there is orrespon ing h nge in the light sour e ire tion su h th t the 
sh ows ppe r the s me his is not restri te to 1 ssi 1 relie s ut s we 
will 1 ter show pplies e u lly to gre ter set o proje tive tr ns orm tions 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 132-151, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




h ow h ng n roj tv m gu ty 



133 



r g n 1 r n orm r g n 1 (2) r n orm (2) 




Fig. 1. n illustr lion o the effe t o pplying gener li e perspe tive s- 
relie ( ) tr ns orm tion to s ene ompose o te pot resting on 

supporting pi ne he first im ge shows the origin 1 te pot he se on im- 
ge shows the te pot ter h ving un ergone tr ns orm tion with 

( 1234 ) (0 0 0 1)) with respe t to the viewpoint use to gen- 
er te the first im ge ( he tr ns orm tion is efine in 2 ) Note th t 

the tt he n st sh ows s well s the o lu ing ontour re i enti 1 
in first two im ges he thir im ge shows the origin 1 te pot rom se on 
viewpoint he ourth im ge reve Is the n ture o the tr ns orm tion 

showing the tr ns orme te pot rom the s me viewpoint s use or the thir 
im ge 



ore spe ifi lly when no je t is viewe rom fixe viewpoint there is 
our p r meter mily o proje tive tr ns orm tions o the o je t’s stru ture n 
the light sour e lo tions su h th t the im ges o the sh ows rem in the s me 
his mily o proje tive tr ns orm tions is su h th t it restri ts sur e points 
to move long the lines o sight i e it fixes the lines p ssing through the o 1 

point urthermore i the sur e h s L m erti n refle t n e 19 12 n is 

viewe orthogr phi lly then or ny o the ove mentione tr ns orm tions o 

the sur e there is orrespon ing tr ns orm tion o the sur e 1 e o su h 
th t the sur e sh ing rem ins oust nt 

t ollows th t when light sour e positions re unknown neither sh ows nor 
sh ing ( or orthogr phi lly viewe o je ts with L m erti n refle nt n e) re- 
ve 1 the o je t’s u li e n stru ture et in 11 p st work on re onstru tion rom 
sh ows 11 17 31 6 13 14 21 sh pe rom sh ing 12 23 n photometri 
stereo 12 27 30 it is expli itly ssume th t the ire tion or lo tion o the 
light sour e is known 

n e tion 2 we expl in the et ils o the sh owing m iguity e show 
th t seen rom fixe viewpoint un er perspe tive proje tion two sur es pro- 
u e the s me sh ows i they iffer y p rti ul r proje tive tr ns orm tion 
whi h we 11 the ener li e erspe tive s- elie ( ) tr ns orm tion 

ee igure 1 or n ex mple o this tr ns orm tion his result hoi s or ny 
num er o proxim 1 or ist nt point light sour es urthermore un er on itions 
where perspe tive proje tion n e pproxim te y orthogr phi proje tion 
this tr ns orm tion is the ener li e s- elie ( ) tr ns orm tion 3 





13 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



n e tion 3 we expl in the et ils o the sh ing m iguity e show th t 

seen rom fixe viewpoint un er orthogr phi proje tion n illumin te y 

light sour es t infinity two sur es pro u e the s me sh ing i they iffcr y 

tr ns orm tion s with the result on sh ows this result hoi s or ny 

num er o point light sour es 

n e tion 4 we show th t the tr ns orm tion is uni ue in th t ny 

two smooth sur es whi h pro u e the s me sh ows must iffer y 
( he result is evelope only or sur es whi h re onvex in sh pe ) 

in lly in e tion we propose n Igorithm or re onstru ting rom the 

tt he sh ow oun ries the stru ture o n o je t up to tr ns or- 

m tion he Igorithm ssumes th t the o je t is viewe orthogr phi lly n 
th t it is illumin te y set o point light sour es t infinity e o not propose 
this Igorithm with the elie th t its present orm h s gre t ppli ility ut 
r ther we give it to emonstr te th t un er i e 1 on itions in orm tion rom 
sh ows lone is enough to etermine the stru ture o the o je t up to 
tr ns orm tion 

2 Shadowing Ambiguity 

Let us efine two o je ts s eing shadow equivalent i there exist two sets o 
point light sour es 5 n 5 su h th t or every light sour e in iS illumin ting 
one o je t there exists light sour e in 5 illumin ting the se on o je t su h 

th t the sh owing in oth im ges is i enti 1 Let us urther efine two o je ts 

s eing strongly shadow equivalent i or any light sour e illumin ting one o - 
je t there exists sour e illumin ting the se on o je t su h th t sh owing 
is i enti 1 i e 5 is the set o 11 point light sour es n this se tion we will 

show th t two o je ts re sh ow e uiv lent i they iffer y p rti ul r set o 

proje tive tr ns orm tions 

onsi er mer - entere oor in te system whose origin is t the o 1 
point whose - n - xes sp n the im ge pi ne n whose - xis points in 

the ire tion o the opti 1 xis Let smooth sur e e efine with respe t 
to this oor in te system n lie in the h 1 sp e 0 in e the sur e is 

smooth the sur e norm 1 n(p) is efine t 11 points p 

e mo el illumin tion s olle tion o point light sour es lo te ne r y 
or t infinity Note th t this is restri tion o the lighting mo el presente y 
L nger n u ker 20 whi h permits nisotropi light sour es whose intensity is 
un tion o ire tion n this p per we will represent sur es light sour es n 
the mer enter s lying in either two or three imension 1 re 1 proje tive 
sp e ( ^ or ^) ( or on ise tre tment o re 1 proje tive sp es see 

22 ) his Hows unifie tre tment o oth point light sour es th t re ne r y 
(proxim 1) or ist nt ( t infinity) n mer mo els th t use perspe tive or 
orthogr phi proje tion 

hen point light sour e is proxim 1 its oor in tes n e expresse s 
s (sx Sy Sz)'^ n proje tive (homogeneous) oor in tes the light sour e s 

^ n e written s s {sx Sy Sz 1)^ (Note th t ifferent onts re use to 




h ow h ng n roj tv m gu ty 



135 



istinguish etween u li e n n proje live oor in tes ) hen point light 
sour e is t infinity 11 light r ys re p r llel n so one is on erne with the 

ire tion o the light sour e he ire tion n e represente s unit ve tor 

in ^ or s point on n illumin tion sphere s — ^ n proje tive oor in tes 
the ourth homogeneous oor in te o point t infinity is ero n so the light 
sour e n e expresse s s (sa; Sy Sz 0)^ (Note th t when the light sour e 
t infinity is represente in proje tive oor in tes the ntipo 1 points rom ^ 
must e e u te ) 

or single point sour e s — ^ let us efine the set o light rays s the 

lines in ^ p ssing through s or ny p — ^ with p — s there is single 

light r y p ssing through p N tur lly it is the interse tion o the light r ys with 
the sur e whi h etermine the sh ows e ifferenti te etween two types 

0 sh ows attached shadows n cast shadows 2 26 ee igures 2 n 3 

sur e point p lies on the or er o n attached shadow or light sour e s 

1 n only i it s tisfies oth lo 1 n glo 1 on ition 

Local Attached Shadow Condition: he light r y through p lies 

in the t ngent pi ne to the sur e t p Ige r i lly this on ition 

n e expresse s n(p) -(p — s) 0 or ne r y light sour e (here p 

n s enote u li e n oor in tes) n s n(p) -s 0 or ist nt 

light sour e (here s enotes the ire tion o the light sour e) point 
p whi h s tisfies t le st the lo 1 on ition is lie local attached 
shadow boundary point 

Global Attached Shadow Condition: he light r y oes not inter- 

se t the sur e etween p n s i e the light sour e is not o lu e 

t p 

Now onsi er pplying n r itr ry proje tive tr ns orm tion ^ — 

^ to oth the sur e n the light sour e n er this tr ns orm tion let 
P (P) n s (s) 

Lemma 1. A point p on a smooth surface is a local attached shadow boundary 
point for point light source s iff p on a transformed surface is a local attached 
shadow boundary point for point light source s . 

Proof, t lo 1 tt he sh ow oun ry point p the line efine y P — 

^ n light sour e s — ^ lies in the t ngent pi ne t p in e the or er 

o ont t (e g t ngen y) o urve n sur e is preserve un er proje tive 

tr ns orm tions the line efine y p ns lies in the t ngent pi ne t p 

Cast shadows o ur t points on the sur e th t e the light sour e ut 

where some other portion o the sur e lies etween the sh owe points n 
the light sour e point p lies on the oun ry o st sh ow or light sour e 

s i n only i it simil rly s tisfies oth lo 1 n glo 1 on ition 




136 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



Local Cast Shadow Condition: he light r y through p gr es the 

sur e t some other point q (i e q lies on n tt he sh ow oun - 
ry) point p whi h s tisfies t le st the lo 1 on ition is lie 
local cast shadow boundary point 

Global Cast Shadow Condition: he only interse tion o the sur e 

n the light r y etween p n s is t q 



Lemma 2. A point p on a smooth surface is a local cast shadow boundary point 
for point light source s iff p on a transformed surface is a local cast shadow 
boundary point for point light source s . 

Proof, or lo 1 st sh ow oun ry point p — ^ n light sour e s — 

^ there exists nother point q — ^ on the line eline y p n s su h 

th t q lies on n tt he sh ow in e olline rity is preserve un er proje tive 
tr ns orm tions p q ns re olline r en e rom Lemm 1 q is Iso n 
tt he sh ow point 

ken together Lemm si n 2 in i te th t un er n r itr ry proje tive 
tr ns orm tion o sur e n light sour e the set o lo 1 sh ow urves is 

proje tive tr ns orm tion o the lo 1 sh ow urves o the origin 1 sur e n 

light sour e owever these two lemm s o not imply th t the two sur es re 

sh ow e uiv lent sin e the tr ns orme points m y proje t to ifferent im ge 

points or the glo 1 on itions m y not hoi 



2.1 Perspective Projection: GPBR 

e will urther restri t the set o proje tive tr ns orm tions o eling the m- 
er s un tion ^ ^ we re uire th t or ny point p on the 

sur e (p) ( (p)) where is proje tive tr ns orm tion th t is p n 

(p) must proje t to the same im ge point e will onsi er two spe ifi mer 
mo els in turn perspe tive proje tion p n orthogr phi proje tion o 

ithout loss o gener lity onsi er pinhole perspe tive mer with unit 
o 1 length lo te t the origin o the oor in te system n with the opti 1 
xis pointe in the ire tion o the - xis Letting the homogeneous oor in tes 
o n im ge point e given y u — ^ then pinhole perspe tive proje tion o 

p — ^ is given y u pp where 



10 0 0 
0 10 0 
00 10 



( 1 ) 



or p(p) p( (p)) to e true or ny point p the tr ns orm tion must 
move p long the opti 1 r y etween the mer enter n p his n e 

omplishe y the proje tive tr ns orm tion p — p where 




h ow h ng n roj tv m gu ty 137 




Fig. 2 . n this 2- illustr tion o the gener li e perspe tive s-relie tr ns or- 
m tion ( ) the lower sh ow is n tt he sh ow while the upper one is 

onipose o oth tt he n st omponents tr ns orm tion h s 

een pplie to the le t sur e yiel ing the right one Note th t un er 
11 sur e points n the light sour e re tr ns orme long the opti 1 r ys 
through the enter o proje tion y tr ns orming the light sour e rom s to s 
the sh ows re preserve 



10 0 0 
0 10 0 
0 0 10 
12 3 4 



( 2 ) 



e 11 this tr ns orm tion the ener li e erspe tive s- elie ( ) 

tr ns orm tion n u li e n oor in tes the tr ns orme sur e n light 

sour e re given y 

1 1 

P ^ P s ^ s (3) 

a p +4 a — s -|- 4 

where a ( 123 )^ igure 2 shows 2- ex mple o eing pplie 

to pi n r urve n single light sour e he effe t is to move points on the 
sur e n the light sour es long lines through the mer enter in m nner 
th t preserves sh ows he sign o a ^ + 4 pi ys riti 1 role i it is positive 

11 points on move inw r or outw r rom the mer enter rem ining in 

the h 1 sp e 0 n the other h n i the sign is neg tive or some points 
on these points will move through the mer enter to points with 0 
i e they will not e visi le to the mer he e u tion a ^ + 4 0 efines 

pi ne whi h ivi es ^ into these two ses; 11 points on this pi ne m p to 

the pi ne t infinity simil r effe t on the tr ns orme light sour e lo tion is 
etermine y the sign o a -s + 4 



Proposition 1 . The image of the shadow curves for a surface and light source 
s is identical to the image of the shadow curves for a surface and light source 
s transformed by a GPBR z/ a -s + 4 0 and a ^ + 4 0 for all p — . 



13 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



Proof, in e is proje live tr ns orm tion Lemm s 1 n 2 show th t 

the lo 1 tt he n st sh ow urves on the tr ns orme sur e rom 
light sour e s re tr ns orm tion o the lo 1 sh ow urves on rom 

light sour e s or ny point p on the sur e n ny tr ns orm tion 

we h ve pp p p n so the im ges o the lo 1 sh ow urves re i enti 1 
o show th t the glo 1 on ition or n tt he sh ow is Iso s tislie 

we note th t proje tive tr ns orm tions preserve olline rity; there ore the only 

interse tions o the line efine y s n p with re tr ns orm tions o 

the interse tions o the line efine y s n p with ithin e h light r y 

( proje tive line) the points re su je te to proje tive tr ns orm tion; in 
gener 1 the or er o the tr ns orme interse tion points on the line my e 
om in tion o y li permut tion n revers 1 o the or er o the origin 1 

points owever the restri tion th t a ^ + 4 0 or 11 p — n th t 

a -s + 4 Ohs the effe t o preserving the or er o points etween p n s 
on the origin 1 line n etween p n s on the tr ns orme line 

t shoul e note or th t or ny a n 4 there exists light sour e s 

suhthta-s+ 4 0 hen is illumin te y su h sour e the tr ns orme 

sour e p sses through the mer enter n the glo 1 sh owing on itions 
m y not e s tisfie en e two o je ts iffering y re not strongly 

sh ow e uiv lent n the other h n or ny oun e set o light sour es n 
oun e o je t there exists set o 1 4 su h th t a -s + 4 On 

a ^ + 4 0 en e there exist set o o je ts whi h re shadow equivalent. 

in e the sh ow urves o multiple light sour es re the union o the sh ow 

urves rom the in ivi u 1 light sour es this Iso hoi s or multiple light sour es 
t shoul Iso e note th t the o lu ing ontours (silhouette) on re 
i enti 1 sin e the mer enter is fixe point un er n the o lu ing 

ontour is the s me s the tt he sh ow urve pro u e y light sour e 

lo te t the mer enter 

igure 1 shows n ex mple o the tr ns orm tion eing pplie to 

s ene ont ining te pot resting on support pi ne he im ges were gener te 
using the r y tr ing p k ge the s ene ont ine single proxim 1 

point light sour e the sur es were mo ele s L m erti n n perspe tive 

mer mo el w s use hen the light sour e is tr ns orme with the sur e 

the sh ows re the s me or oth the origin 1 n tr ns orme s enes ven 

the sh ing is simil r in oth im ges so mu h so th t it is ne rly impossi le to 

istinguish the two sur es owever rom nother viewpoint the effe t o the 
tr ns orm tion on the o je t’s sh pe is pp rent 
his result ompliments p st work on stru ture rom motion in whi h the 
im o stru ture re overy is we ker non- u li e n represent tion su h s 
fhne 1 24 2 2 proje tive 9 or or in 1 10 



2.2 Orthographic Projection: GBR 

hen mer is ist nt n n e mo ele s orthogr phi proje tion the 
visu 1 r ys re 11 p r llel to the ire tion o the opti 1 xis n ^ these 




h ow h ng n roj tv m gu ty 



139 




Fig. 3. he im ge points th t lie in sh ow or sur e un er light sour e s 
re i enti 1 to those in sh ow or tr ns orme sur e un er light sour e s 

n this 2- illustr tion the lower sh ow is n tt he sh ow while the upper 
one is onipose o oth tt he n st omponents gener li e s-relie 
tr ns orm tion with oth fi ttening n n itive pi ne h s een pplie to 
the le t sur e yiel ing the right one 

r ys interse t t the mer enter whi h is point t infinity ithout loss 
o gener lity onsi er the viewing ire tion to e in the ire tion o the - xis 
n the - n - xes to sp n the im ge pi ne g in letting the homogeneous 
oor in tes o n im ge point e given y u — ^ orthogr phi proje tion o 

p — ^ n e expresse s u oP where 

'1 0 0 0 ' 

o 0 10 0 (4) 

0 0 0 1 

Now let us onsi er nother set o proje tive tr ns orm tions ^ — 

^ or o(p) o( (p)) to e true or ny point p the tr ns orm tion 

must move p long the viewing ire tion his n e omplishe y the 
proje tive tr ns orm tion p — p where 

'1 0 0 0 ' 

0 10 0 ^ ^ 

12 3 4 

0 0 0 1 

with 3 0 he m pping is n fHne tr ns orm tion whi h w s intro u e 

in 3 n w s lie the gener li e s-relie ( ) tr ns orm tion onsi er 

the effe t o pplying to sur e p r meteri e s the gr ph o epth 

un tion ( ( )) his yiel s tr ns orme sur e 



( 6 ) 




t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



1 0 



ee igure 3 or n ex mple he p r meter 3 h s the effe t o s ling the 

relie o the sur e 1 n 2 h r teri e n itive pi ne n 4 provi es 

cpth offset s es ri e in 3 when 1 2 0 n 0 3 1 the resulting 

tr ns orm tion is simply ompression o the sur e relie s in relie s ulpture 

Proposition 2. The image of the shadow curves for a surface and light source 
s are identical to the image of the shadow curves for a surface and light source 
s transformed by any GBR. 

Proof, he proo ollows th t o reposition 1 

t shoul e note th t reposition 2 pplies to oth ne r y light sour es n 
those t infinity owever in ontr st to the tr ns orm tion ne r y light 

sour e o not move to infinity nor o light sour es t infinity e ome ne r y light 

sour es sin e is n fline tr ns orm tion whi h fixes the pi ne t infinity 

in e reposition 2 hoi s or any light sour e 11 o je ts iffering y 
tr ns orm tion re strongly shadow equivalent 

n impli tion o repositions 1 n 2 is th t when no je t is o serve 
rom fixe viewpoint (whether perspe tive or orthogr phi proje tion) one n 
t est re onstru t its sur e up to our p r meter mily o tr ns orm tions 
( or ) rom sh ow or o lu ing ontour in orm tion irrespe tive 

o the num er o im ges n num er o light sour es n er the s me on i- 
tions it is impossi le to istinguish (re ogni e) two o je ts th t iffer y these 
tr ns orm tions rom sh ows or silhouettes 

3 Shading Ambiguity 

Let us efine two o je ts s eing strongly shading equivalent i or any light 
sour e illumin ting one o je t there exists sour e illumin ting the se on 
o je t su h th t sh ing is i enti 1 n this se tion we will show th t two o - 
je ts with sur es h ving L m erti n refle t n e 19 12 re strongly sh ing 
e uiv lent i they iffer y ny o the set o tr ns orm tions es ri e in 

the previous se tion ere we onsi er ist nt illumin tion (p r llel illumin ting 

r ys) o o je ts viewe un er orthogr phi proje tion (p r llel lines o sight) 

onsi er g in mer - entere oor in te system whose origin is t the 

o 1 point whose - n - xes sp n the im ge pi ne n whose - xis points 
in the ire tion o the opti 1 xis n this oor in te system the epth o every 

visi le point in the s ene n e expresse s 

( ) 

where is pie ewise ifferenti le un tion he gr ph ( ( )) efines 

sur e whi h will Iso e enote y he ire tion o the inw r pointing 

sur e norm 1 n( ) n e expresse s 



1 



n( ) 



y 



(7) 




h ow h ng n 



roj tv m gu ty 11 



where a; n y enote the p rti 1 eriv lives o with respe t to n 
respe lively 

n e we restri t ourselves to orthogr phi proje tion we no longer nee the 
nil m hinery o proje live oor in tes visi le point p on the sur e h s 
u li e n oor in tes p ( ( ))^ s one in 6 we write the 

tr ns orm tion on sur e point p s p^ p + (0 0 4)^ where h s een 

rewritten s 

'10 0 ' 

0 10 ( ) 
12 3 



n er the m trix pro u t oper tion the set — — orms su group 

o ( 3 ) with 

3 0 O' 

0 3 0 

— 1 — 2 1 




Iso note th t or im ge point ( ) the rel tion etween the ire tion o the 

sur e norm 1 o ^ n is given y where ~ 

( ^ 1 )^ ( s shown in 3 this is the only line r tr ns orm tion o the sur e’s 
norm 1 fiel whi h preserves integr ility ) 

Letting the 1 e o o L m erti n sur e e enote y ( ) the 

intensity im ge pro u e y light sour e s n e expresse s 



If, a A ) ^fA )b^( )S 



where b( ) is the pro u t o the 1 e o ( ) o the sur e n the inw r 

pointing unit sur e norm 1 n( ); the ve tor s enotes point light sour e t 

infinity with the m gnitu e o s proportion 1 to the intensity o the light sour e; 

n ) is in ry un tion su h th t 

^ r 0 i ( ) is sh owe 

1 1 otherwise 

e now show th t sh ing on sur e or some light sour e s is i enti 1 

to th t on tr ns orme sur e ^ or light sour e s^when ^h s 1 e o 

( )^given y 



— \/( 3 a:- 1 z )^ + { 3 y - 2 (9) 

where i x y be effc t o pplying 9 to 1 ssi 1 s-relie 

tr ns orm tion 0 3 1 is to rken points on the sur e where n points 

w y rom the opti 1 xis 

his tr ns orm tion on the 1 e o is su tie n w rr nts is ussion or 3 

lose to unity the tr ns orm tion on the 1 e o is ne rly impossi le to ete t 

h t is i you tr ns orm the sh pe o sur e y tr ns orm tion ut 

le ve the 1 e o un h nge then the ifferen es in the im ges pro u e un er 




12 t r N. Ihum ur v . Kr gm n n 1 n L. u 11 

V rying illumin tion re too sm 11 to reve 1 the ifferen e in the stru ture n 
ig 4 we le t the 1 e o un h nge ^ ) ( ) n even though 3 

r nges rom 0 to 1 the ifferen es in sh pe nnot not e is erne roni the 

ront 1 im ges owever when the 1 e o is un h nge n the fl ttening is more 
severe eg ten ol (3 0 1) the sh ing p tterns n reve Ithefl tness o the 

sur e his effe t is o ten seen on very low relie s ulptures (e g on tello’s 
rilievo schiacciato) whi h repro u e sh owing ur tely ut sh ing poorly 
Note th t or the sh owing to e i enti 1 it is ne ess ry th t 3 0 hen 

3 0 the sur e ^is inverte ( s in intaglio) or orrespon ing tr ns or- 

m tion o the light sour e the illumin te regions o the origin 1 sur e n 
the tr ns orme sur e ^will e the s me i the 1 e o is tr ns orme s e- 

s ri e ove ( his is the well known up/ own” ( onvex/ on ve) m iguity ) 

owever the sh ows st y ^ n my iffer uite r m ti lly 

Propositions. For each light source s illuminating a Lambertian surface ( ) 

with albedo ( ), there exists a light source illuminating a surface ) 

(a GBR transformation of ) with albedo ) (is given in Eg. 9 ), such that 

) df'a'.s'i )■ 

Proof. he im ge o is given y 

If, a A ) 'I'Lsi )b^( )s 

or ny 3 — 3 inverti le m trix we h ve th t 

//.a,s( ) )b^( ) S 

in e is su group o ( 3 ) n ) I'f'.s'i ) 

If,aA ) I'fA )b^( ) S 

)b^( )s- 

If',a',s'{ ) 

where ) ^^b( ) n s 

en e two o je ts with L m erti n sur es iffering in sh pe y tr ns- 

orm tion n iffering in 1 e o y 9 re in ee strongly sh ing e uiv lent 
he three pre e ing propositions h ve shown th t when L m erti n sur e 
with 1 e o ( ) is illumin te y single light sour e the set o im ges it 

n pro u e y v rying the light sour e strength n ire tion re e uiv lent to 
those pro u e y tr ns orme sur e ^with 1 e o ) given y 

9 et ue to the superposition o im ges this result hoi s not simply or 

im ges pro u e y single point light sour e ut Iso or im ges pro u e y 

ny possi ly infinite om in tion o point light sour es 

hese results emonstr te th t when oth the sur e n light sour e i- 
re tions re tr ns orme y oth the sh owing n sh ing re i enti 1 

in the im ges o the origin 1 n tr ns orme sur e n impli tion o this 

result is th t given ny num er o im ges t ken rom fixe viewpoint neither 




h ow h ng n roj tv m gu ty 



1 3 




Fig. 4. hr - m n on 1 t or hum nh wotn ungl r n(y- 

rw V ) n r n r (top row) L m rt n ur w th on t nt 1 o ( u 1 
gr y V lu or 11 ur po nt ). h u u nt thr row how mg o th h 
who h p h n tr n orm y r nt g n r 1 z -r 1 tr n orm t on 

ut who 1 o h not n tr n orm . h profil v w o th n th th r 

olumn r V 1 th n tur o th n v u 1 tr n orm t on n th r t on o th 

1 ght our . h top row th tru h p ; th on rom top fl tt n h p 
(ga 0.5) ( r 1 1 -r 1 ); th thr n long t h p (ga 1.5); 

n th ottom fl tt n h p plu n t v pi n (ga 0.7 Q 2 0.5 n 
gi 0.0). h fir t olumn how ront 1 v w o th n th th r olumn. 

rom th V w th tru 3- tru tur o th o j t nnot t rm n ; n h 
m g th h ow ng p tt rn r nt In v n though th 1 oh not n 

tr n orm or ng to . 9 th h ng p tt rn r o lo to prov w 

u to th tru tru tur . h on olumn how n r ront 1 v w o th 

n th fir t olumn trhvng n prtly rot t to omp n t or th gr 

o fl tt n ng or long t on. Not th t v n m 11 rot t on pp r not to r v 1 th 3- 

tru tur . 




1 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



omputer vision Igorithm nor iologi 1 pro ess n istinguish two o je ts 
th t iffer y tr ns orm tion Knowle ge (or ssumptions) out sur e 

sh pe sur e 1 e o light sour e ire tion or light sour e intensity must e 

employe to resolve this m iguity ee g in ig 4 

4 Uniqueness of the Generalized Bas-Relief 
Transformation 

ere we prove th t un er orthogr phi proje tion the gener li e s-relie 
( ) tr ns orm tion is uni ue in th t there is no other tr ns orm tion o n 

o je t’s sur e whi h preserves the set o sh ows pro u e y illumin ting the 

o je t with 11 possi le point sour es t infinity e onsi er only the simplest 
se n o je t with onvex sh pe sting no sh ows on its own sur e 

n show th t the set o tt he sh ow oun ries re preserve only un er 

tr ns orm tion o the o je t’s sur e 

e 11 th t n tt he sh ow oun ry is efine s the ontour o points 

( ( ))^ s tis ying n -s 0 or some s or onvex o je t the glo 1 

tt he sh ow on ition hoi s everywhere ere the m gnitu e n the sign 
o the light sour e re unimport nt s neither effe ts the lo tion o the t- 

t he sh ow oun ry hus let the ve tor s (sx Sy SzY' enote in ho- 

mogeneous oor in tes point light sour e t infinity where 11 light sour es 
pro u ing the s me tt he sh ow oun ry re e u te i e (sx Sy SzY ~ 

( Sa; Sy SzY ~ ~ ^0 ith tliis the sp e o light sour e ire tions S 

is e uiv lent to the re 1 proje tive pi ne ( with the line t infinity given 

y oor in tes o the orm (sa; Sy 0)^ Note th t in e tion 2 we represente 

light sour es s points in here we restri t ourselves only to ist nt light 

sour es lying in the pi ne t infinity o ^ ( re 1 proje tive pi ne) 

Let n { X y zY enote the ire tion o sur e norm 1 g in the 
m gnitu e n sign re unimport nt soweh ve( a; y zY ~ { x y zY 
— — — 0 hus the sp e o sur e norm Is M is likewise e uiv lent to 

^ Note th t un er the e u tion n -s 0 the sur e norm Is re the u 1 
o the light sour es h point in the ^ o light sour es h s orrespon ing 
line in the ^ o sur e norm Is n vi e vers 

Let us now onsi er the im ge ontours efine y the points ( ) s tis ying 

n -s 0 or some s hese im ge ontours re the tt he sh ow oun ries 

orthogr phi lly proje te onto the im ge pi ne or 1 k o etter n me we 
will re er to them s the im ge tt he sh ow oun ries 

he set o im ge tt he sh ow oun ries or onvex o je t orms n 

str t proje tive pi ne ^ where point” in the str t proje tive pi ne 

is single tt he sh ow oun ry n line” in the str t proje tive 

pi ne is the olle tion o im ge tt he sh ow oun ries p ssing through 

ommon point in the im ge pi ne o see this note the o vious proje tive 

isomorphism etween the re 1 proje tive pi ne o light sour e ire tions S n 
the str t proje tive pi ne o im ge tt he sh ow oun ries ^ n er 

this isomorphism we h ve ije tions m pping points to points n lines to lines 




h ow h ng n roj tv m gu ty 15 
2 




Fig. 5. he rel tion o ifferent sp es in proo o reposition 4 

Now let us s y th t we re given two o je ts whose visi le sur es re 

es ri e y respe tive un tions ( ) n ^ ) the o je ts h ve the 

s me set o im ge tt he sh ow oun ries s seen in the im ge pi ne (i e 

i the o je ts re strongly sh ow e uiv lent) then the uestion rises ow re 

the two sur es ( ) n ^ ) rel te ? 

Proposition 4. If the visible surfaces of two convex objects and are strongly 
shadow equivalent, then the surfaces are related by a generalized bas-relief trans- 
formation. 

Proof. s illustr te in igure we n onstru t proje tive isomorphism e- 

tween the set o im ge tt he sh ow oun ries ^ n the re 1 proje tive 

pi ne o light sour e ire tions S illumin ting sur e ( ) he isomorphism 

is hosen to m p the olle tion o im ge tt he sh ow oun ries p ssing 
through ommon point ( ) in the im ge pi ne (i e line in to the 

sur e norm 1 n( ) n the s me m nner we n onstru t proje tive iso- 
morphism etween ^ n the re 1 proje tive pi ne o light sour e ire tions S 

illumin ting the sur e ( ) he isomorphism is likewise hosen to m p the 

s me olle tion o im ge tt he sh ow oun ries p ssing through ( ) in 

the im ge pi ne to the sur e norm 1 n ( ) n er these two m ppings we 

h ve proje tive isomorphism etween 5 n iS whi h in turn is proje tive 
tr ns orm tion ( olline tion) 1 e use J\f n J\f re the u Is o 5 n S 

respe tively the sur e norm Is o ( ) re Iso rel te to the sur e norm Is 

o ( ) y proje tive tr ns orm tion i e n ( ) n( ) where is 

3 3 inverti le m trix 




t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



1 6 



he tr ns orm tion is urther restri te in th t the sur e norm Is long 
the o lu ing ontour o n re e uiv lent i e the tr ns orm tion 

pointwise fixes the line t infinity o sur e norm Is hus must e o the 
orm 

■fo r 
01 2 
0 0 3 . 

where 3 — 0 he effe t o pplying to the sur e norm Is is the s me s 
pplying in to the sur e i 1 - 1 3 2 - 2 3 n 3 1 3 

h t is h s the orm o the gener li e s-relie tr ns orm tion Note th t 

the sh ows re in epen ent o the tr nsl tion 4 long the line o sight un er 
orthogr phi proje tion 



5 Reconstruction from Attached Shadows 

n the previous se tion we showe th t un er orthogr phi proje tion with 
ist nt light sour es the only tr ns orm tion o sur e whi h preserves the set 
o im ge sh ow ontours is the gener li e s-relie tr ns orm tion owever 
roposition 4 oes not provi e pres ription or tu lly re onstru ting sur e 
up to n this se tion we onsi er the pro lem o re onstru tion rom the 

tt he sh ow oun ries me sure in im ges o sur e e h illumin te 

y single ist nt light sour e e will show th t it is possi le to estim te the 

light sour e ire tions n the sur e norm Is t finite num er o points 11 

up to n gener 1 we expe t to re onstru t the sur e norm Is t ( ^) 

points rom the re onstru te norm Is n pproxim tion to the un erlying 

sur e n e ompute or fixe Item tively existing sh pe- rom- 

sh ow metho s n e use to re onstru t the sur e rom the estim te light 

sour e ire tions ( or fixe ) n rom the me sure tt he n st 

sh ow urves 11 17 31 

irst onsi er the o lu ing ontour (silhouette) o sur e whi h will e 
enote 0 his ontour is e uiv lent to the tt he sh ow pro u e y 

light sour e whose ire tion is the viewing ire tion efine oor in te system 
with X n y sp nning the im ge pi ne n with z pointing in the viewing 
ire tion or 11 points p on the o lu ing ontour the viewing ire tion lies 

in the t ngent pi ne (i e n(p) -z 0) n the sur e norm 1 n(p) is p r llel 

to the im ge norm 1 en e i the norm 1 to the im ge ontour is ( x yY' the 

sur e norm 1 is n { x y 0)^ n ^ the sur e norm Is to 11 points on 

the o lu ing ontour orrespon to the line t infinity 

Now onsi er the tt he sh ow oun ry 1 pro u e y light sour e 

whose ire tion is si ee igure 6 or 11 points p — 1 si lies in the t ngent 

pi ne i e si m(p) 0 here 1 interse ts the o lu ing ontour the norm 1 

ni n e ire tly etermine rom the me sure ontour s es ri e ove t 
shoul e note th t while 1 n the o lu ing ontour interse t tr nsvers lly 
on the sur e their im ges generi lly sh re ommon t ngent n orm the 




h ow h 



ng n roj tv m gu ty 17 






Fig. 6. e onstru tion up to rom tt he sh ows or single o je t 

in fixe pose these figures show superimpose tt he sh ow ontours i or 

light sour e ire tions he sur e norm 1 where i interse ts the o lu ing 

ontour is enote y he norm 1 t the interse tion o i n j is enote 

y THij ) he three ontours interse t t three points in the im ge ) he three 
ontours meet t ommon point implying th t si S2 n S3 lie on gre t ir le 
o the illumin tion sphere ) ight tt he sh ow oun ries o whi h our 

interse t t pi ,2 n our interse t t pi,s; the ire tion o the light sour es 

Si sg n the sur e norm Is t the interse tion points n e etermine 
up to ) he stru ture o the illumin tion sphere ^ or the light sour e 

ire tions gener ting the tt he sh ow oun ries in ig 6 



res ent moon im ge singul rity Note th t y me suring ni long the o - 
lu ing ontour we o t in onstr int on the light sour e ire tion si 0 

his restri ts the light sour e to line in ^ or to gre t ir le on the illumi- 
n tion sphere ^ he sour e Si n e expresse p r metri lly in the mer 
oor in te system s 



si( 1) os i(ni-z)+sin iz 

rom the sh ows in single im ge it is not possi le to urther onstr in si nor 

oes it seem possi le to o t in ny urther in orm tion out points on 1 

Now onsi er se on tt he sh ow oun ry 2 orme y se on 

light sour e ire tion S2 g in the me surement o n2 (where 2 interse ts 0) 
etermines proje tivelinein ^ (or gre t ir le on th t the light sour e 
S2 must lie on n gener 1 1 n 2 will interse t t one or more visi le sur e 

points the o je t is onvex n the uss m p is ije tive then they only 




1 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



interse t t one point pi^2 or non- onvex sur e in 2 m y interse t 

more th n on e owever in 11 ses the ire tion o the sur e norm 1 ni^2 
t the interse tions is 

ni,2 Si(i)-S2(2) (10) 

hus rom the tt he sh ows in two im ges we ire tly me sure ni n U2 

n o t in estim tes or ni^2 si n S2 s un tions o i n 2 

onsi er thir im ge illumin te y S3 in whi h the tt he sh ow 
oun ry 3 does not p ss through pi^2 ( ig 6 ) g in we n estim te 

proje tive line (gre t ir le on ont ining S3 e Iso o t in the sur e 

norm 1 t two ition 1 points the interse tions o 3 with in 2 rom 

the tt he sh ow oun ries or onvex sur e me sure in im ges 

i no three ontours interse t t ommon point the sur e norm 1 n e 

etermine t ( — 1) points s un tion o unknowns i i 1 

owever the num er o unknowns n e re u e when three ontours inter- 
se t t ommon point onsi er ig 6 where ontour 4 interse ts 1 n 

2 t pi^2 n this se we n in er rom the im ges th t si S2 n S4 11 lie in 

the t ngent pi ne to pi^2 n ^ this rne ns th t Si S2 S4 11 lie on the s me 
proje tive line in e n4 n e me sure S4 n e expresse s un tion o 

in 2 ie 



S4( 12) H4 - (si( 1) - S2( 2)) 

hus set o tt he sh ow urves (1 2 4 in ig 6 ) p ssing through 
ommon point (pi,2) is gener te y light sour es (si S2 S4 in ig 6 ) lo- 
te on gre t ir le o ^ he light sour e ire tions n e etermine up 
to two egrees o ree om 1 n 2 Now i in ition se on set o light 
sour es lies long nother proje tive line (the gre t ir le in ig 6 ont ining 

Si S3 S6 S7) the orrespon ing sh ow ontours (1 3 e 7 in ig 6 ) in- 

terse t t nother point on the sur e (pi,3) g in we n express the lo tion 
o light sour es (se S7) on this gre t ir le s un tions o the lo tions o two 
other sour es (si n S3) 



Sii 13) Hi - (Si( 1) - S 3 ( 3)) 

in e Si lies t the interse tion o oth proje tive lines we n estim te the 
ire tion o ny light sour e lo te on either line up to just three egrees o 

ree om 1 2 n 3 urthermore the ire tion o ny other light sour e (sg 

on ig 6 ) n e etermine i it lies on proje tive line efine y two light 
sour es whose ire tions re known up to 1 2 n 3 rom the estim te 
light sour e ire tions the sur e norm 1 n e etermine using 10 t 11 
points where the sh ow urves interse t s mentione e rlier there re ( ^) 
su h points o serve the num er o interse tions in ig 6 t is e sy to veri y 

Ige r i lly th t the three egrees o ree om 1 2 n 3 orrespon to the 

egrees o ree om in 1 2 n 3 he tr nsl tion 4 o the sur e long 

the line o sight nnot e etermine un er orthogr phi proje tion 




h ow h ng n roj tv m gu ty 



1 9 



6 Discussion 

e h ve efine notions o sh ow e uiv len e or o je t showing th t two o - 
je ts iffering y our p r meter mily o proje tive tr ns orm tions ( ) 

re sh ow e uiv lent un er perspe tive proje tion urthermore un er ortho- 
gr phi proje tion two o je ts iffering y gener li e s-relie ( ) tr ns- 

orm tion re strongly sh ow e uiv lent i e or ny light sour e illumin ting 
n o je t there exists light sour e illumin ting tr ns orme o je t su h th t 
the sh ows re i enti 1 e h ve proven th t is the only tr ns orm tion 

h ving this property hile we h ve shown th t the o lu ing ontour is Iso 

preserve un er n it shoul e note th t im ge intensity is- 

ontinuities (step e ges) rising rom sur e norm 1 is ontinuities or 1 e o 
is ontinuities re Iso preserve un er these tr ns orm tions sin e these points 
move long the line o sight n re viewpoint n (generi lly) illumin tion in- 
epen ent onse uently e ge- se re ognition Igorithms shoul not e le 
to istinguish o je ts iffering y these tr ns orm tions nor shoul e ge- se 
re onstru tion Igorithms e le to per orm u li e n re onstru tion without 
ition 1 in orm tion 

n e rlier work where we on entr te on light sour es t infinity 4 3 we 

showe th t or ny set o point light sour es the sh ing s well s the sh - 

owing o n o je t with L m erti n refle t n e re i enti 1 to the sh ing n 

sh owing o ny gener li e s-relie tr ns orm tion o the o je t i e the 
illumin tion ones 4 re i enti 1 his is onsistent with the effe tiveness o 
well- r te relie s ulptures in onveying gre ter sense o the epth th n is 
present t is le r th t sh ing is not preserve or or or when the 

light sour es re proxim 1; the im ge intensity 11s off y the re ipro 1 o the 
s u re ist n e etween the sur e n light sour e n ist n e is not pre- 
serve un er these tr ns orm tions Nonetheless or r nge o tr ns orm tions 

n or some sets o light sour es it is expe te th t the intensity m y only v ry 
slightly 

urthermore we h ve shown th t it is possi le to re onstru t sur e up 
to rom the sh ow oun ries in set o im ges o implement re- 

onstru tion Igorithm se on the i e s in e tion re uires ete tion o 

st n tt he sh ow oun ries hile ete tion metho s h ve een pre- 
sente 29 it is un le r how effe tive these te hni ues woul e in pr ti e 

n p rti ul r tt he sh ows re p rti ul rly ifh ult to ete t n lo li e 

sin e or L m erti n sur e with oust nt 1 e o there is is ontinuity in 

the intensity gr ient or sh ing flow fiel ut not in the intensity itsel n the 

other h n there is step e ge t st sh ow oun ry n so extensions 
o the metho es ri e in e tion whi h use in orm tion out st sh ows 
to onstr in the light sour e ire tion m y le to pr ti 1 implement tions 
Leon r o in i’s st tement th t sh ows o relie s ulpture re oreshort- 

ene ” is stri tly spe king in orre t owever relie s re o ten onstru te in 

m nner su h th t the st sh ows will iffer rom those pro u e y s ulpture 
in the roun elie s h ve een use to epi t n rr tives involving numerous 

figures lo te t ifferent epths within the s ene in e the s ulpting me ium 




150 



t r N. Ihum ur 



V 



. Kr gm n n 1 n L. u 11 



is usu lly not thi k enough or the rtist to s ulpt the figures to the proper rel - 

tive epths s ulptors like on tello n hi erti employe rules o perspe tive 

to etermine the si e n lo tion o figures s ulpting e h figure to the proper 
relie 16 hile the sh owing or e h figure is sel onsistent the sh ows 
st rom one figure onto nother re in orre t urthermore the sh ows st 

onto the kgroun whose orient tion usu lly oes not orrespon to th t o 

w 11 or fioor in the s ene re Iso in onsistent Note however th t n ient 
reek s ulpture w s o ten p inte ; y p inting the kgroun o the rthenon 

rie e rk lue 7 st sh ows woul e less visi le n the istortions less 
pp rent hus Leon r o’s st tement is n ur te h r teri tion o omplex 

relie s su h s hi erti’s st oors on the ptistery in loren e ut oes not 

pply to figures s ulpte singly 

Acknowledgment s 

ny th nks to vi um or or le ing us to the proo o reposition 4 
N elhumeur w s supporte y resi enti 1 rly reer w r n N 

reer w r -9703134 n gr nt 04-9 -1-0494 Kriegm n 

w s supporte y N un er N -92 7990 L uille w s supporte y 
gr nt 04-9 -1-0494 n N gr nt -93-17670 

References 

1 .rtn. t .ntr n ulhrn.Nw ork 1957. 

2.xnll. t t. InvrtyrNwvn 

1995. 

3 . Ihum ur . Kr gm n n . u 11 . h -r 1 m gu ty. n P 

P tt 1997. n pr . 

. N. Ihum ur n . . Kr gm n. ht th tomg o nojt 

un r 11 po 11 ght ng on t on . n P 

P tt p g 270 277 1996. 

5 . r ton n . u k r. h ow n h ng flow fi 1 . n P 6 p g 

7 2 7 9 1996. 

6 . h ng n K. hi. 1 m t ng th u 1 ng h ght n ty rom th h ow 

n p n hrom t pot- mg. It to 2 ulng. 16(3) 09 15 

1995. 

7 . . ook. . rv r n v r ty r m r g 19 . 

L. on t n N. tolfi. ngul r t o Hum n t ur . t t 

23(3) 207 216 1997. 

9 . ug r . tr t fl t on o 3- V on roj t V flTm n m tr r pr n- 

t t on . t 12(7) 65 1995. 

10 . rmull r n . lo mono . rnlrpr nttonovulp .nP 

t p g 97 90 1996. 

11 . tz th o orou. h r v t on o 3- ur h p rom h ow . n P 

t p g 1012 1020 19 9. 

12 . orn. t . r m r g . 19 6. 




h ow h ng n roj tv m gu ty 



151 



13 . urt n .Nvt. ttono ulngn rlmg unghp 

nhow.nP t t t t pg 1099 1103 19 3. 

1 . rv n n . K own. tho or xplo t ng th r 1 t on h p tw n u 1 - 

ng n th r h ow n r 1 m g ry. t t 

19(6) 156 1575 19 9. 

15 . K mp tor. P t . 1 nvrtyr Nw vn 

19 9. 

16 . K mp. t t t t 

t t. 1 nvrtyr Nw vn 1990. 

17 . K n r n . m th. h p rom rkn . n ^ t 

p g 539 5 6 19 7. 

1 . Ko n r nk n . n oorn. fhn tru tur rom mot on. 

(2) 377 3 5 1991. 

19 . L m rt. P t t t t 

rh r K1 tt 1760. 

20 . L ng r n . u k r. h t 1 ght our ? n P 

P tt p g 172 17 1997. 

21 . on . t n ng 3- rom h ow n rlmg. n P 5pg 
73 76 19 3. 

22 . un y n . rm n. t t . 

r 1992. 

23 .In. nun nhp rom h ng. t t 6(2) 75 

10 un 1991. 

2 . o nholtz n . Ko n r nk. fhn tru tur n photom try. n P 

P tt p g 790 795 1996. 

25 L. h p ro . rm n n . r y. 3 mot on r ov ry v fhn p pol r 

g om try. t t 16(2) 17 12 to r 1995. 

26 . h hu . t P t t 3 t . h th 

1992. 

27 . Iv r. ^ fit t . h th 

m r g 19 0. 

2 . 11m n n . r . ogn t on y 1 n r om n t on o mo 1 . 

P tt t 13 992 1006 1991. 

29 . tk n. nt n ty- g 1 fit on. n P t t 

p g 36 1 19 2. 

30 . oo h m. n ly ng m g o urv nr . ^ t 17 117 

1 0 19 1. 

31 . ng n . K n r. h p rom h ow un r rror. n P 

t p g 10 3 1090 1993. 




Grouping in the Normalized Cut Framework 



it n M lik i n o hi g longi n horn L ung 

opu n vonnv yoflfon kly 

k 1 y 94 20 

{malik, j ski , sjb .leungt }@cs .berkeley . edu 



Abstract, n h p p w u y low Ivl g gn onnh 

no Iz u f wo k p opo y h n M 1 k (199 ). h go 1 

op on h g f o g p u po n of V w. p u lly 

gn n g oup wh 1 11 v on n 1 

1. n gfu nny olo x u on 

on on nu y o on n o p y n on un fo 

f wo k. ugg on fo n 1 v 1 g oup ng on h 

on pu of h low Ivl g n on. 

1 Introduction 

uppo w int t in th ognition o o j t in ompl kg oun . 
noni 1 mpl on i n ing 1 op in th ppl light o 

jungl . 

n ition to tt hing m nti 1 1 -1 op t - t om t g o vi- 

u 1 p ption w lohv nw n o whi h pi 1 in th im g long 

tog th -th pot on gion th 1 v o th t noth n o on. 

h th u h g ouping o pi 1 into gion th t long to ingl o j t i 

on p u o to ognition o i on u n o ognition h n 

th u j t o mu h t in oth hum n vi ion n m hin vi ion. 

vi w thi to 1 i hotomy-g ouping i i y ognition n 

vi V - n th mo t uit ul pp o h to opt i on in whi h g ouping 

n ognition int twin p o . h u ul ppli tion o i - 
n M kov Mo 1 in p h to imult n ou ly gui g ouping n ognition 

ugg t th pow o u h n pp o h. 

u h m wo k i not y t v il 1 o vi ion. ow v w n k t h wh t 
mu t nti 1 ompon nt n int g t t tm nt o 

1. low-1 V 1 u oh n o ightn olo n t tu ; 

2. int m i t 1 v 1 u ymm typ lllim pt tutu n on- 
V ity; w 11 

3. high Ivlu pi ojt knowl g o ont t in o m tion. 

hi vi w o hi hi 1 g ouping w t mph iz y h in th 

t It hool o p y hology in th ly p t o th tw nti th ntu y. h y 
t t th v iou to o g ouping p o imity imil ity goo ontinu - 
tion ymm typ 11 li m onv ity ommon t n mili on gu tion. 

D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 155—164, 1999. 

© Springer- Verlag Berlin Heidelberg 1999 




1 



n M Ik 



hough th y i not h v m th m ti lly p i o mul tion it i i to y 
th t th y h t ong intuitiv n o wh t i impo t nt out th p o 1 m. 

t i impo t nt to not th t th t It to not ul in t th y 

h V p o ili ti int p t tion . h to o p o imity only m n th t pi 1 
n y more likely to long to th mg oup. O viou ly it i u to y 
th t n y pi 1 always long tog th . Mo ov p o ili ti m wo k 
n n tu lly olv onfli ting u n How o multipl hypoth 

n omput vi ion th h not y t n u ul mon t tion o 
uni hit tu o g ouping n ognition whi h om in th v iou 

low-1 V 1 int m i t -1 v 1 n high-1 v 1 u in n tiv m nn . n thi 

p p w vi w no m liz ut whi h p ovi uni m wo k o th 

low-1 V 1 u . 1 o outlin how int m it 1 v 1 g ouping oul op t on 

th ult o no m liz ut . ow to in o po t high-1 v 1 knowl g m in 
V y mu h n op n u tion. 

n th p t low-1 V 1 u h V typi lly n t t in i ol tion on o two 
o th u t tim . will vi w om p nt tiv mpl h . h 
mo t wi ly u gm nt tion Igo ithm i g t tion 3 . n g t - 
to m k 11 th pi 1 wh th ig i ontinuiti in int n ity olo o 

t tu . h u o ontou ontinuity i ploit to link th g 1 tog th 

to o m long ontou 9 . tu in o m tion i n o th pon to 

t o lin It 6 . noth o mul tion o gm nt tion i th variational 
formulation, i 1 imil iti n lo lly ut th n 1 gm nt tion i 

o t in y optimizing glo 1 un tion 1 . o motion gm nt tion on 

popul Igo ithm i th motion 1 y pp o h th go 1 i to imult n - 
ou ly tim t multipl glo 1 motion mo 1 n th i p ti 1 uppo t . h 

p t tion-M imiz tion ( M) Igo ithm How on to hi v thi go 1 

Ou pp o h i on th no m liz ut m wo k p opo y hi 

n M Hk 11 . h go 1 i to p tition th im g om “ ig pi tu point 

o vi w. ptu lly igni nt g oup t t t whil m 11 v i tion 

n t il t t 1 t . i nt im g tu int n ity olo t - 

tu ontou ontinuity motion n t o i p ity t t in on uni o m 
m wo k. 



2 Segmentation Using Normalized Cuts 

n thi tion w vi w th no m liz ut m wo k o g ouping p opo 

y hi n M Hk in 11 . hi n M lik o mul t vi u 1 g ouping g ph 
p titioning p o 1 m. h no o th g ph th ntiti th t w w nt to 

p tition o mpl in im g gm nt tion th y will th pi 1 ; in vi o 

gm nt tion th y will p -tim t ipl t. h g tw n two no 

o pon to th strength with whi h th two no long to on g oup g in 

in im g gm nt tion th g o th g ph will o pon to how mu h two 

pi 1 g in int n ity olo t ; whil in motion gm nt tion th g 

i th imil ity o th motion, ntuitiv ly th it ion o p titioning 




oup ng n h o 1 z 



u 



wo k 



1 



th g ph will to minimiz tli um o w ight o onn tion across th g oup 

n m imiz th um o w ight o onn tion within th g oup . 

LtG V,E w ight un i t g ph wh V th no n 

E th g . L t S p tition o th g ph A B V,A B 
n g ph th o ti 1 ngu g th imil ity tw n th two g oup i 11 

th cut 



cuL{A, B) 



w{u, v) 



E 

u A,v B 

wh w{u, u) i th w ight on th g tw n no u n 

p opo to u normalized imil ity it ion to v lu t 

11 it th normalized cut , . 

cut{A,B) cut{B,A) 



Ncut{A, B) 



asso{A^V) asso{B,V) 



u. hi n M lik 
p tition. h y 



wh asso{A, V) Ylu A t V i ^h tot 1 onn tion om no in A 

to 11 th no in th g ph. o mo i u ion on thi it ion pi 

to 11 . 

On k y V nt g o u ing th no m liz ut i th t goo pp o im tion 
to th optim 1 p tition n omput v y i ntly. ^ Ij i W th 

o i tion m t i i. . Wij i th w ight tw n no i n j in th g ph. L t 
D th i gon 1 m t i u h th t Du i th um o th 

w ight o 11 th onn tion to no i. hi n M lik how th t th optim 1 
p tition n oun y omputing 



V 



g min Ncut 



gmin 

y 



y^{D-W)y 

y^Dy 



( 1 ) 



wh y -a, ^ i in y in i to v to pi ying th g oup i ntity 

o h pi 1 i. . ?/i a i pi 1 i long to g oup A n yj 6 i pi 1 j 

long to B. N i th num o pi 1 . oti th t th ov p ion i th 
yl igh uoti nt. w 1 y to t k on 1 v lu (in t o two i t 
V lu ) w n optimiz u tion 1 y olving g n liz ig nv lu y t m. 

i nt Igo ithm with polynomi 1 unning tim w 11- known o olving 
uhpolm. h o w n omput n pp o im tion to th optim 1 

p tition V y i ntly. o t il o th iv tion o u tion 1 pi 

to 11 . 



3 The Mass- Spring Analogy 

w h V ju t n th o m liz ut Igo ithm ui th olution o 
g n liz ig n y t m involving th w ight j n y m t i . n thi tion 
w V lop th intuition hin thi p o y on i ing phy i 1 int p - 
t tion oth ignytm m -p ing y t m. 

n ng h u op Ip on n o pi p o 1 . 




1 



n M Ik 



On n ily V i y th t th ymm t i po itiv mi- nit m t i ( D — 
W) known in g ph th o y th Laplacian o th g ph G o pon to 
stiffness m,a.trix whil th i gon 1 po itiv mi nit m t i D p nt 

mass matrix, h m t i typi Ily not y K n M p tiv ly 

n pp in th u tion o motion 

Mx(t) — Kx(t) 

w um olutiono th o m x(t) o {oJkt+f) w o t inth ollowing 

g n liz ig nv lu p o 1 m o th tim -in p n nt p t 

Kvfe WfcMvfc 

in n logy to u tion (1). 

h intuition i th t h pi 1 p nt m n h onn tion w ight 

p nt ook p ing on t nt. th y t m i h k n tightly onn t 

g oup o pi 1 will t n to h k tog th . 

n light o thi onn tion th g n liz ig nv to in u tion (1) p- 

nt no m 1 mo o vi tion o n uiv 1 nt m - p ing y t m on 

th p i wi pi 1 imil iti o illu t tiv pu po w no m 1 mo 

0 th 1 n p t t im g in igu 2 hown in igu 1 tog th with 

n p hot o up po ition o th mo 

ing th m - p ing n logy on npo to n muo imi- 
1 ity within th p o mo y on i ing th m imum t n ion o h 
p ing ov 11 tim . to thi th inter-group distance. i 

in 2 th int -g oup i t n tw n two pi 1 i n j m y n th 

ollowing w ight Li no m 

^ 1 . 

diG{i,j) J2—^k-^k- (2) 

in th p ing h v 1 g t n ion tw n g oup n m 11 t n ion 
within g oup n o viou ppli tion o th int -g oup i t n i to n 
m u o lo 1 “ gin t h pi 1. 1 to 2 o mo t il 

1 u ion o thi i 



4 Local Image Features 

n gion- gm nt tion Igo ithm imil ity tw n pi 1 no 

lo Ily n th i glo 1 outin th t m k th i ion o p titioning. n 
th no m liz ut m wo k lo 1 pi 1 imil iti no in th weight 

m t i W i u in tion 2. n thi tion w will i how lo 1 pi 1 
imil iti no to t k into ount th to o imil ity in int n ity 

olo t tu ; ontou ontinuity n ommon motion (o ommon i p ity in 
t op i ). 

^ o h n w u f oun y on on onn h g of h g 
w gno h o n o pon o un fo n 1 on. 





Fig. 1 . h g n liz ig nv to (v2 V3 n V4) o th 1 n p t t 
im g hown in ( )-( ). n illu t tion o th onn tion tw n o - 

m liz ut n th n ly i o m - p ing y t m up po ition o th 
mo t n it y tiin in t nt i hown in ( ) u plot. 



4.1 Brightness, Color, and Texture 

t look t how w m u pi 1 imil iti u to ightn olo n 

t tu . tu in o m tion i m u th pon to t o z o-m n 

i n o uin(0)n i n 00 t uin( 00 ) k - 

n 1 imil to tho u o t tu n ly i in 6 . 11 th v to o It 

pon th t tu tu V to Utex ifi —I, f2 —I, ■ ■ ■ , In ■ nt n- 
ity n olo m u u ing hi tog m with o t inning. w it th 
int n ity/ olo tu v to Wcoi- h om in t tu n int n ity/ olo 

tu V to t pi Hi thu giv n y Ui . hi tu v to 

i no m liz to h v L 2 no m u 1 to 1 Ui Mi/||tti||. oti th t ||Mcoi|| i 
pp o im tly u 1 to on t nt. h no m liz tion t p n th n n 

o m o g in ont ol whi h imini h th ont i ution o th int n ity/ olo 

ompon nt wh n th i lot o tivity in th t tu ompon nt . h i - 

imil ity tw n two pi 1 i th n n 

(^teXjCol ("^2 ^ (^2 ) 



4.2 Contour Continuity 

n o m tion out u vilin ontinuity n 1 o in o po t into th imi- 
1 ity mu tw n two pi 1 . ontou in o m tion n omput “ o tly 
th ough orientation energy 7 {OE{x)). O i nt tion n gy i t ong t n - 
t n ontou o h p ont t whil it will w k t low ont t g p 



long th ontou . nh n th o i nt tion n gy t low ont t g p y 
p op g ting th n gy om n igh o ing pi 1 long n t n ontou . h 

p o ility o p op g tion i iv om th n gy o th elastica u v om- 
pl tion mo 1 13 . O i nt tion n gy t p op g tion p ovi u with o t 

in o m tion out th p no ontou . ntuitiv ly th to o u vilin 

ontinuity y th t two pi 1 long to two i nt g oup i th i ontou 

p ting th m. h i imil ity i t ong i th ontou i t n . O i n- 

t tion n gy How u to ptu thi notion v y ily. iv n pi \ pi n 

P 2 i imil ity tw n th m i n to high 5 i th o i nt tion n gy 

long th lin joining th m i t ong. hu i H th t ight lin tw n p\ 

n p 2 n a: i pi 1 on / w n th i imil ity u to ontou ontinuity 

dedg{pi,P 2 ) m -OE{x) - 0.5{OE{pi) + OE{p 2 ))~ 

X—l 

n It n tiv to thi nition on n t i t th v lu tion o th 
o i nt tion n gy to point lying on g ontou . h g ontou n 
t t n lo liz u ing o mpl m im o o i nt n gy 10 . u h 

nition 1 to h p gm nt tion t th p n o m 11 mount o 
omput tion. 

4.3 Motion and Stereo Disparity 

o motion gm nt tion (o ino ul gm nt tion o t o p i ) th no 

0 th g ph th t ipl t {x,y,t) wh {x,y) not im g lo tion n t 

1 tim . h w ight tw n two no i th imil ity o th motion 

t th two pi 1 lo tion t th t tim . p opo to omput th w ight 

o tly th ough motion profile, n t o t ying to t min tly wh h 

pi 1 mov to in th n t m ( in opti 1 flow) w omput probability 

distribution ov th lo tion wh th pi 1 might mov to. imil ity tw n 

two no i th n m u th imil ity o th motion p o 1 . 

hi t hni u n m omput tion Ily i nt o long im g 
u n y on i ing only num o im g m nt oun 

h in oming im g m in th tim om in to omput th gm nt tion. 
u th i igni nt ov 1 p o th im g m u to omput th 

gm nt tion om on tim t p to noth w n u it to ou v nt g to 

p up ou omput tion. p i Ily wh n olving th g n liz ig n y - 

t m u ing th L n zo m tho th ig nv to om p viou tim t p n 

p ovi u with goo gu o th initi 1 v to t th n t tim t p n w 

n iv t th olution v y ui kly. n mpl o th motion gm nt tion 

ult o th flow g n uni hown in . o t il pi 

to 12 . 

4.4 Results 

ult hown in igu 2 u ing t tu n int n ity n in igu 3 u ing 
ontou ontinuity. o mo ult th i n ou g to look t ou 

it http:/ /'WWW. cs. berkeley. ed'u/~jshi/ Gro'a'piny / . 




oup ng n h 



z 




Fig. 2. gm nt tion u ing int n ity n t tu .0 igin 1 im g hown on th 
1 t n th gm nt on th ight. 
















n M Ik 



1 2 




Fig. 3. gm nt tion on int n ity n ontou ontinuity. L t o igin 1 

im g ; mi 1 gm nt ; ight oun i o gm nt . 

5 Discussion 

hi 1 no m liz ut p ovi th i gion w int t in going 
u th n ploiting int m it 1 v 1 g ouping u w 11. k t h n 
outlin o how thi oul p o 

h t ition 1 w y to t n th g ouping m h ni m to int m i t g oup- 
ing u woul to t k th u V o th gm nt gion (o g om 

impl g t to ) giv n n t y to n ymm tipi o to look o 

jun tion to on out o lu ion. hi vi w i un ti to y. nt m it 

1 V 1 g ouping n n houl t im g gm nt tion. Low-1 v 1 g ouping 
i to p ovi “hint to invok int m it 1 v 1 g ouping. p thw y to go 
k n h ng th gm nt tion houl p ovi . ow v th in- 

t in i i n in oth th p nt tion n omput tion 1 m h ni m 

tw n low-1 V 1 n int m it 1 v 1 g ouping. Low-1 v 1 g ouping op t 
in ontinuou om in (pi 1 ) n t 1 t in ou m wo k i t mini ti 

(giv n g ph omput th t gm nt tion) whil int m i t -1 v 1 g oup- 
ing op t in i t ym oli om in ( gion n u v ) n n to 

p o li ti to How o multipl int p t tion ( olving th i nt int p - 

t tion involv high -1 v 1 knowl ginth omooj tp i knowl g 
o om in n ont t in o m tion.) 







oup ng n h 









^ v'.- 












^Hp Hr » - 










mfr _ 



L_L1 



1 


T im 




P * 






■* - 1 





Fig. 4. u plo ( ) how h of h x f of h “flow g 

long w h h g n on. h o g n 1 g z 120 x 1 n 

ofz3x3 u oonu hp ong ph. h of h 

onn ooh h 1 hn uppxln3 gf 

( ) how hlhohlhf ofh qun n h oc 
u ng k ng Igo h w h h 1 ng w n ow ho . 



g n qu n 

n g p h 
g p h 

w y. u plo 
o on g n on 



Ou pp o h i to p nt th p o ill ti ym oil wo 1 M kov 

n om i 1 (M ) whil m int ining link to th g ph p nt tion o 

th gm nt tion. ouping pu ly on low-1 v 1 u i u to on t u t 

th no o th M whi h o pon to gion in th im g p o u y 
no m liz ut w 11 th i oun ing u v . h on gu tion o v lu 
o th n om v i 1 in th M i i nt int p t tion o th n 
n n m pp to p ti ul g ouping. n n o int m i t 1 v 1 

g ouping to in th li u pot nti 1 o i t with th M n thu 

mo 1 i nt on gu tion ing mo o 1 lik ly. M kov h in Mont 

lo (M M ) m tho How u to omput th p o ility i t i ution o 
th on gu tion i ntly y mpling. 

o king out th t il o u h n hit tu n how to in o po t o - 

j t t go y p i knowl g in thi m wo k p omi h h 11 ng 

o m ny y to om . 





n M Ik 



1 4 

6 Acknowledgements 

h wo k w uppo yn glL y n( 94 11334) ( ) 

04 9 1 0341 n u Ilow hpfo ..n ..n .. kly 

h n llo ppo un y o o 1 How h p fo . . 

References 



1 


long 


on 


n p 


n n 


.Mlk. 


olo n XU 






g g 


n on u ng 


h xp 


on 


X z on Igo h n 


ppl 




on o on n 


g 


V 1. 


n Proc. Int. 


Conf. Computer Vision 




0 y n 


n. 199 . 












2 


long 


n .Mlk. 


n ng 


oun 


n n u 


1 g n w 


ho 




u ng po n 


p 0 n 


0 


pi on. n Proc. 


of Fifth ECCV 


g 




199 . 














3 


nny. 


0 pu on 


1 pp 0 


h 0 


g on. IEEE Trans. 


Pattern 




Anal. Mach. 


Intell. 19 













4 . . p .M. L n ..un. Mxulkl hoo f o no 

pi V h Igo h . J. Royal Statistical Society 39( ) 19 

. L ung n . M 1 k. on on on nu y n g on g g n on. 

n Proc. of Fifth ECCV g pp. 44 9 199 . 

.Mlkn . on. nv xu nonwh lyvon 

h n . J. Optical Society of America (2) 923 32 M y 1990. 

M. . Mo on n . . w n . u on f o lo 1 n gy. Pattern 

Recognition Letters 303 13 19 

. Mu fo n . h h. p 1 pp ox on y p w oo h fun on 
n o V onlpol . Comm. Pure Math, p g 4 19 9. 

9. nn..uk. nfn uvuon nynuv 

on. IEEE Trans. Pattern Anal. Mach. Intell. 11( ) 23 39 ug. 19 9. 

10 . on n . M 1 k. ng n lo 1 z ng g o po of p p k 

n oof . n Proc. Int. Conf. Computer Vision p g 2 k p n 

1990. 

11 .hn .Mlk. o Iz u n g gn on. n Proc. IEEE 
Conf. Computer Vision and Pattern Recognition pg 31 nun uo 

o un 199 . 

12 . h n .Mlk. Mo on g n on n k ng u ng no 1 z u . n 

Proc. Int. Conf. Computer Vision o y n n. 199 . 

13 . 11 n. 11 ng n h g p h h p of u j v on ou n 

h g n on. Biological Cybernetics 2 1 19 . 



o 1 fo 




Geometric Grouping of Repeated Elements 
within Images 



Frederik Schaffalitzky and Andrew Zisserman 



Department of Engineering Science 
University of Oxford, UK 
{f sm, az}(3robots .ox.ac.uk 



Abstract. The objective of this work is the automatic detection and 
grouping of imaged elements which repeat on a plane in a scene (for ex- 
ample tiled floorings) . It is shown that structures that repeat on a scene 
plane are related by particular parametrized transformations in perspec- 
tive images. These image transformations provide powerful grouping con- 
straints, and can be used at the heart of hypothesize and verify grouping 
algorithms. The parametrized transformations are global across the im- 
age plane and may be computed without knowledge of the pose of the 
plane or camera calibration. 

Parametrized transformations are given for several classes of repeating 
operation in the world as well as groupers based on these. These groupers 
are demonstrated on a number of real images, where both the elements 
and the grouping are determined automatically. 

It is shown that the repeating element can be learnt from the image, and 
hence provides an image descriptor. Also, information on the plane pose, 
such as its vanishing line, can be recovered from the grouping. 



1 Introduction 

Grouping is one of the most fundamental objectives of Computer Vision and 
pervades most of the disparate sub-disciplines; for example object recognition 
always involves perceptual organization (or figure/ground separation) to some 
extent; shape-from-texture involves grouping texels; boundary detection involves 
grouping edgels (e.g. saliency of curves); motion segmentation involves grouping 
independently moving objects over multiple frames etc. 

In this paper we investigate the grouping of repeated structures. The mo- 
tivation for this are three-fold: first, repetitions are common in the world — 
examples include parquet floor tilings, windows, bricks, patterns on fabrics, wall- 
paper; second, the groupings provide a compact image descriptor, essentially a 
‘high level’ feature, which may be used for image matching — for example in 
image database retrieval, model based recognition, and stereo correspondence; 
third, the retrieved repeating operation can provide shape and pose information 
— for example the vanishing line of a plane — in a similar manner to that of 
shape-from-texture. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 165-181, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




166 



Frederik Schaffalitzky and Andrew Zisserman 



To be specific the objective is quite simply stated: suppose a structure is 
repeated in the world a number of times by some operation (for example a 
translation); then identify this structure and all its repetitions from a perspec- 
tive image. The outcome is the imaged element, and a grouping over the imaged 
repetitions. This simple statement does belie the actual difficulty of a computa- 
tional procedure since a priori the element is unknown — in fact the element 
only ’exists’ because it is repeated by the (unknown) image operation. 

For the body of the paper we specialize the operation to repetitions on a 
plane, and return to a more general setting in section 4. In particular it is shown 
in section 2 that the operation of repeating by a translation on a scene plane 
induces relations between the imaged elements. These relations are represented 
by a parametrized transformation. There are only four parameters that need be 
specified, and these may be determined from the image. 

The significance of this transformation is that it provides a necessary condi- 
tion that must be satisfied by imaged elements related by a translation operation 
on a scene plane. The transformation is powerful as a basis for a grouping al- 
gorithm (a grouper) because of the following properties: it is global across the 
image plane; the class of transformation is independent of camera calibration; 
and, the class is independent of the pose of the scene plane. Furthermore, the 
transformation is exact under perspective projection, i.e. it does not require a 
weak perspective approximation. A grouper for this parametrized transformation 
is described in section 3 

Image relations of this type have appeared before in the literature. For exam- 
ple, “ID relations” such as that a line is imaged as a line; that collinear points 
are imaged as collinear points; and, that parallel lines are imaged as concurrent 
lines; all have the above useful properties. These ID relations have been used 
by Lowe [6], amongst others, as a basis for perceptual grouping. The relations 
described in this paper may be thought of as “2D relations”, and have been 
investigated previously by [4, 13]. Repeated 3D (i.e. non-planar) structures also 
induce image relations [7]. For example, points on objects with bilateral sym- 
metry (they are repeated by a reflection operation), or more generally points 
repeated by any 3D projective transformation, satisfy an epipolar geometry con- 
straint in the image [5, 7, 9]. There are also relations on the image outlines of 
particular classes of curved surfaces, such as straight homogeneous generalized 
cylinders, and these have been employed in grouping algorithms [8, 15, 16]. 



2 The Image Relation Induced by Repetitions on a Plane 

In this section we describe the image transformation that arises from the op- 
eration of repeating by a translation on a scene plane. The derivation is very 
short. 

In general the scene plane and image plane are related by a planar homog- 
raphy (a plane projective transformation). This map is written x = PX where 
P is a 3 X 3 homogeneous matrix, and x and X are homogeneous 3-vectors rep- 




Geometric Grouping of Repeated Elements within Images 167 



resenting corresponding points on the image and scene plane respectively. The 
transformation P has 8 degrees of freedom (dof). 

On the scene plane the repeating translation is represented as X' = TX, 



where 





■fot,' 




■f 0 o' 


/ fa: \ 


T = 


0 1 ty 


= 


0 1 0 


+ My 1 




00 1 




0 0 1 


\ 0 / 



The image transformation H between the points x and x', which are the 
images of X and X', will be called a conjugate translation. The reason for this 
is evident from x' = PX' = PTX = PTP^^x, so that x' = Hx, where H = PTP^^. 
See figure 1. 



.X 










f 




r 


T 


















P 






T 




T 


















world 





X X 







image 



Fig. 1. A translation T on a world plane induces a conjugate translation H in the 
image. 



The conjugate translation H may be written as 





■f 0 O' 


I \ 


/o\' 


H = PTP^^ = P 


0 1 0 


+P My 


p-^ 0 




0 0 1 


\ 0 / 


[ v)\ 



= I + Avloo^ with v.loo = 0 (1) 

where I is the 3x3 identity, and 

— The 3-vector v is the vanishing point of the translation direction. It is a fixed 
point of H. 

— The 3-vector l(x> is the vanishing line of the scene plane. It is a line of fixed 
points under H. 

— The scalar A represents the translation magnitude. 

The geometric interpretation is illustrated in figure 2 

The transformation has only 4 dof, and these may be specified by the line 
loo (2 dof), a point v on loo (1 dof), and A (1 dof). This is four less dof than 
a general homography, and two less dof than the canonical and ‘simple’ affine 
transformation used by many authors in the past for this type of grouping [4], 





168 



Frederik Schaffalitzky and Andrew Zisserman 




Fig. 2. Geometric interpretation of the parameters of a conjugate translation 
(elation). 



— yet the transformation H exactly models perspective effects which are not 
accounted for by an affine transformation. 

A few remarks on this transformation: The transformation applies to two 
elements repeated by the translation anywhere on the image plane. If there is 
a line of repetitions (as in figure 1) then the zeroth element is mapped to the 
n-th as H = I + nAvloo"''. The transformation (1) can be determined from two 
point or two line correspondences. Once the transformation is determined, then 
so is loo. A planar projective transformation with a line of fixed points, and fixed 
points only on this line is known in the literature as an elation [10, 11]. 



2.1 Grids 

An extension to repeating by a single translation is where there is a repetition in 
two directions so that the world pattern is a grid of repeated elements. The image 
is then a conjugate grid. This mapping can be thought of as being composed of 
two elements 

Hv = I + Avloo^ Hu = I + Atuloo"'' 

one for each direction u, v, i.e. a total of six degrees of freedom. However, note 
that loo is common to both, so that once the transformation is determined in 
one direction only two degrees of freedom remain for the transformation in the 
other direction. These two degrees of freedom can be determined by one point 
correspondence. 

3 Grouping Imaged Repeated Patterns 

In the previous section it has been shown that elements that repeat by a trans- 
lation on the plane are related in the image by an elation. Thus the problem 
of grouping repeated patterns can be reduced to that of finding image elements 




Geometric Grouping of Repeated Elements within Images 169 



related by an elation, and the rest of this section describes a grouping algorithm 
for elations. 

Initially we do not know the elements or the transformation. This is the 
chicken and egg problem that often arises in computer vision: if we know the 
elements we can easily determine the four parameters of the transformation; 
conversely, if we know the transformation we can (relatively) easily identify el- 
ements. In essence then the grouping algorithm must determine simultaneously 
a transformation (model) and elements consistent with that transformation. A 
similar situation arises in estimating multiple view relations from several images 
of a scene, for example the epipolar geometry [12], and ideas can be borrowed 
from there. 

In outline the idea is to first hypothesize a set of elements and associa- 
tions between these elements. This set is then explored to evaluate if it contains 
groupings consistent with an elation. This is a search, but it can be made very 
efficient by a hypothesize and verify approach: the four parameters of the elation 
are determined from a small number (one or two) of associations, and this hy- 
pothesized elation is then verified by testing how many members of the set are 
mapped under it. This search is equivalent to the problem of robustly fitting a 
model to data containing outliers. In this case the model is the transformation, 
and the outliers are the members of the set which are not mapped under the 
transformation. Depending on how the elements and associations are obtained, 
a very large proportion of the set may consist of outliers. 

The algorithm is summarized in the following section, and illustrated by 
working through an example. 



3.1 Elation Grouping Algorithm 

There are five stages to the algorithm. The first two stages are aimed at obtaining 
seed correspondences. The seeds are elements and their associations, and should 
be sufficiently plentiful that some of the actual elements and associations of 
the sought elation are included. It is not the aim at this stage that all seed 
correspondences are correct. 

1. Compute interesting features. These may include interest points (e.g. corners), 
edges, closed regions, oriented texture determined by a set of filter banks etc. 
The aim is simply to identify regions of the image that are sufficiently interesting. 
See figures 3-5 

2. Associate features. An affinity score is then employed to associate features 
that may be related. Generally the affinity is based on ’similarity’ and prox- 
imity. An example would be cross-correlation of nearby interest point intensity 
neighbourhoods. Primarily the choice of affinity score is driven by the invari- 
ance sought on the scene plane. For example, that the albedo (reflectance) of 
an element should be exactly repeated in the scene. However, illumination and 
imaging effects require that the affinity score has a greater degree of photometric 




170 



Frederik SchafFalitzky and Andrew Zisserman 



and geometric invariance. At the most basic level the aim is to use photomet- 
ric cues to filter out obvious mismatches (e.g. matching a predominantly black 
region with a predominantly white region), but to retain plausible matches. It 
is important that the affinity score is at least partially geometrically invariant 
to the transformation sought : if the affinity score is too sensitive to the effects 
of the transformation, it could reject correct matches. For example, intensity 
cross-correlation is invariant to translation, but is variant to the rotation and 
skewing which occur under an elation. 

One approach is to use combined affine/photometric invariants [14]. These 
can be applied to regions bounded by automatically detected closed curves. The 
advantage of such invariants are two fold: first, invariants can be matched effi- 
ciently using indexing; second they can be associated globally across the image. 
An example is shown in figure 3. Another approach is to determine geometric 
features, since these are largely invariant to photometric conditions, and then 
associate the features based on the intensity cross correlation of their neighbour- 
hoods. An example of this is shown in figure 6. 




one cluster verified grouping enlarged grouping 



Fig. 3. Seed matches using closed curves. The idea here is to identify interesting 
regions by detecting closed Canny [1] edge contours, and then determine if these 
regions are related by affine transformations by computing their affine texture 
moment invariants [14]. Regions which are related by an affine transformation 
have the same value for affine invariants. Thus clustering on the invariants yields 
a putative grouping of regions. Eight affine invariants are computed, so each 
curve gives a point in an 8-dimensional space. The points are clustered in this 8D 
space by the k-means clustering algorithm. The plot shows the distribution and 
clustering of the zeroth order moments of shape (horizontal axis) and intensity 
(vertical axis). The cluster used as a hypothesised grouping is the bottom left- 
most one. 





Geometric Grouping of Repeated Elements within Images 171 



The next stage is a robust estimation of the elation based on the seed corre- 
spondences. 

3. RANSAC [2] robust estimation. An elation can be instantiated from a minimal 
number of correspondences that provide either (a) two line correspondences, no 
two of which are collinear or (b) two line correspondences, two lines of which 
are collinear, and one point correspondence on the other two lines. The robust 
estimation then proceeds as follows: 

1. Select a random minimal sample of seed correspondences and compute the 
elation H. 

2. Compute the number of inliers consistent with H, i.e. the number of other 
seed correspondences that map under H. 

3. Choose the H with the largest number of inliers. 

For example, in figure 7, the white lines denote the initial seed correspondence 
chosen and the darker^ lines denotes the correspondences found to be consistent 
with the elation estimated from the seed. 

The RANSAC fit provides an initial estimate of the elation. This estimate is 
then refined by the following stage. 

4- Maximum Likelihood Estimation (MLE). Re-estimate the four parameters of 
H from all correspondences classified as inliers by minimizing a ML cost func- 
tion. A ML estimation requires the estimation of the elation together with a 
set of auxiliary points which map exactly under the estimated elation. The cost 
function is the image distance between the measured and auxiliary points. As- 
suming the measurement error is Gaussian, then minimizing this cost function 
provides the ML estimate of the elation. See [12] for a description of MLE for 
homographies. 

For the example at hand, the vanishing line and vanishing point of the MLE 
are shown in figure 8. 

5. Guided matching. Using the estimated parameters search for new elements 
consistent with the model by defining a search region about the transferred 
element position. As figure 8 shows, the location of new elements predicted by 
the MLE can be very accurate. 

Further examples of elation grouping, using exactly the same algorithm, are 
shown for other images in figures 9 and 10. 



3.2 Grid Grouping Algorithm 

A similar hypothesize and verify algorithm to the elation grouper may be applied 
to the case of a conjugate grid, described in section 2.1. Examples of the grid 
grouper are shown in figures 11-14. 



^ Blue in the luxury edition of this paper. 




172 



Frederik Schaffalitzky and Andrew Zisserman 




Original image. Fitted lines. 



Fig. 4. The sought (but unknown) element /grouping is the repeated floor tiling. 
The features which successfully provide this element/grouping are line pair in- 
tersections. The first stage in determining the features is fitting straight lines to 
Canny edge detector output. 




Line intersections Lines and points together 



Fig. 5. Left: points of intersection of the lines found above. The line segments 
are extended slightly to allow intersections just beyond their endpoints. Right: 
lines and intersection points together. Note that line intersections do identify 
the corners of the floor tilings, but these points are only a small proportion of 
all the intersections detected. 



Geometric Grouping of Repeated Elements within Images 173 




Closeup of features 



Seed matches 



Fig. 6. Left: a closeup of the computed intersections. Right: the black lines join 
pairs of intersections which are deemed to look similar on the basis of intensity 
neighbourhood correlation. Line pairs which reverse orientation are excluded, 
since these cannot map under an elation. 




Sample with highest support 




Sample with next highest support 



Fig. 7. These two images demonstrate the core of the method : each seed match 
(shown in white) is sufficient to determine an elation in the image. These putative 
elations can be verified or rejected by scoring them according to the number 
of feature correspondences consistent with them. The two seed matches whose 
corresponding elations received the highest support are shown. 




174 



Frederik Schaffalitzky and Andrew Zisserman 




Fig. 8. Given the inliers to the elation grouping, the parameters of the elation 
can be estimated (MLE) more accurately. Left: the ground plane vanishing line 
and vanishing point of the translation direction. Note that the horizontal line 
is a very plausible horizon line and that the feature tracks all pass through 
the vanishing point. Right: the accuracy of the estimated parameters is also 
demonstrated by transferring elements under the elation: the extended tiling is 
obtained by mapping the original image lines under the estimated elation. 




Fig. 9. More examples of the elation grouper in action. The corners of the win- 
dows of the building on the left have been grouped together by the elation 
constraint. The wall on the right has two sizes of brick, but they are grouped 
together here by virtue of satisfying the same elation constraint. 





Geometric Grouping of Repeated Elements within Images 175 




Fig. 10. These figures show two groupings found in the same image by the 
elation grouper. Despite the difference in pose of the planes in the world, the 
same grouping algorithm is successful for both cases. 



The importance of guided matching, which is the final stage of the algorithm, 
is very well illustrated in these examples. The previous stages of the algorithm 
have delivered a ML estimate of the transformation, and a number of elements 
which are mapped under the estimated transformation. In the case of a grid it is 
a simple matter to determine which of the integer grid positions is unoccupied, 
and then search the image for evidence of an element at the corresponding image 
point. 

In detail an element is verified by comparing its similarity to the nearest (in 
the image) existing element of the grid. In figure 12 the similarity is measured 
by cross-correlation. This procedure identifies elements which have been missed 
in the initial feature detection. There may well not be any features present, but 
because the transformation and intensity are tightly estimated false positives 
are not generated. Another possibility is to reapply the segmentation in the 
indicated region, but with the segmentation parameters suitably modified to 
allow for a perspective scaling. For example, suppose a square is 50% of the size 
of its neighbour then the Gaussian width of a Canny edge detector could be 
reduced to detect sharper edges, and the line length thresholds also reduced. 

The output of the grid grouper provides many examples of the imaged ele- 
ment. From these we can now estimate the frontoparallel intensity on the tile : 
each projectively distorted element is warped into the unit square and the result- 
ing textures are averaged in the unit square. The image can then be synthetically 






176 



Frederik Schaffalitzky and Andrew Zisserman 




Fig. 11. The first stage of the grid grouping algorithm. From the original image 
(left), an initial grouping (right) is computed by associating features using only 
correlation of intensity neighbourhoods. 




Grid structure found in the grouping New elements found by guided search 



Fig. 12. The second stage of the grid grouping algorithm. The initial grouping 
found is processed to elucidate the spatial organisation, namely the grid-like 
structure of the locations of the elements. This structure is then used to guide 
a global search for new elements. Note that although only half of the potential 
elements are determined by the initial fit of figure 11, the tight constraints on 
geometry and intensity provided by the transformation enable virtually every 
visible element to be identihed. 






Geometric Grouping of Repeated Elements within Images 177 




Fig. 13. The floor is generated from the learnt element and spatial organization 
of the grid. 




Fig. 14. Another example of the grid grouper. Left: original image. Right: the 
pattern is generated from the element (which is included as an inset) and group- 
ing determined by the algorithm. Note the algorithm only selects elements be- 
longing to the grid. The two planes in the scene are geometrically indistinguish- 
able. 




178 



Frederik Schaffalitzky and Andrew Zisserman 



generated by applying the learnt transformation to the estimated intensity of the 
element. This is demonstrated in both figure 13 and figure 14. It demonstrates 
that the element plus grouping does provide a succinct description for substantial 
parts of the image. 

3.3 Grouping Performance 

It is evident from these examples that the elation and grid groupers perform 
extremely well — e.g. the grid grouper identifies all the non-occluded elements 
with no false positives. This success can be attributed largely to the fact that the 
transformation has been modelled exactly, and that it is very over determined 
by the available image data, i.e. there are many more correspondences, which 
provide constraints, than the four parameters which must be determined. 

Ideally the algorithm should return a description of the element and a spatial 
organisation of the grouping. It is easier to determine the element for the grid 
than for the elation, because in the case of the grid the element is delineated 
in both directions, whereas for the elation the element is only demarked in the 
translation direction. 

The organization of the grouping is quite primitive at present, consisting 
of little more than the grid positions occupied. A more compact description 
would be the element and a set of operations which generate the grid. Such a 
description is not uniquely defined of course, as the same grid can be generated 
by repeating an element by one unit or by repeating a pair of elements at two 
units of spacing. In fact, for the group of integer displacements on the plane along 
the two coordinate axes, the grid can be generated by any one of the following 
sets of translation vectors 




Although clearly the first two are a more suitable choice as a basic generator. 

One idiosyncrasy of using the number of inliers as a scoring mechanism in 
RANSAC is that generators at the smallest repeating distance will always be 
selected because there will be more of these present in the seed set. 

There are also various meta-groupings that could be used to spatially organize 
the data. For example the top windows in figure 3 may be organized as four 
meta-groupings, each consisting of nine grouped elements. 

4 Conclusions and Extensions 

Here we have investigated in detail one repeating operation on a plane, namely 
a translation, for which the induced transformation is an elation. This serves 
as an exemplar for the other wall paper groups (discrete subgroups of the 2D 
affine group) of repeating operations on a plane, such as glide, rotation and 
reflection groups [3]. Indeed the operation need not be restricted to a single 
plane. Examples of similar repeating operations and the induced image relation 
are shown in the table below. 




Geometric Grouping of Repeated Elements within Images 179 



Transformation 



Elation 



I + Vloo^, 
where loo-v = 0. 



Family of Planar 
Homologies 

I + fcvloo^(loo^v) 
for integers k 



Conjugate Rotation 

H" = I for an 
n-fold symmetry 



Parallel Lines 
The imaged lines 
are concurrent. 
Equal spacing in 
world determines 
world plane 
vanishing line 



Example image 



Schematic 




Similar grouping strategies can be developed for each of these examples. Since 
in man made scenes there are a plentiful supply of elements that do exactly repeat 
on planes, it is certainly worth building groupers for those repeating operations 
that commonly occur. It is clear there are always two aspects that must be 
considered when designing such groupers: 

1. Grouping geometry: Given a repeating operation in the world, determine 
the geometric relationships that are induced in the image between the imaged 
repeated elements. 

2. Grouping strategy: Develop a grouping strategy based on these relations. 
This will usually involve a choice on the degree of geometric and photometric 
invariance required for related elements. 














180 



Frederik Schaffalitzky and Andrew Zisserman 



It is certainly plausible that efficient and reliable groupers can be built for vir- 
tually any class of exact repeating operation. However, several degrees of greater 
generality will be required for the non-exact repetitions that also commonly oc- 
cur: even if the repeating operation is on a plane, it is often the case that either 
the repetition is not exact, or the element is not exactly repeated by the repeti- 
tion. A brick wall has both these problems. This type of non-exactness can be 
modelled by drawing the repeating parameter from a suitable statistical distribu- 
tion. A far more demanding extension is to the type of repetition that occurs for 
leaves on a tree, where the colour, shape and size will vary from leaf to leaf (trees 
are like that), there is a wide (but not uniform) distribution of element poses 
and there are complex lighting effects produced by both leaves and branches. 



Acknowledgements 

We are grateful to T. Leung for providing the building image of figure 9, and 
for comments by Henrik Christensen. The algorithms in this paper were imple- 
mented using the lUE/targetjr software packages. This work was supported by 
the EPSRC lUE Implementation Project GR/L05969, an EPSRC studentship 
and EU ACTS Project Vanguard. 

References 

[1] J. Canny. A computational approach to edge detection. IEEE T-PAMI, 8(6):679- 
698, 1986. 

[2] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model 
fitting with applications to image analysis and automated cartography. Comm. 
ACM, 24(6):381-395, 1981. 

[3] D. Hilbert and S. Cohn-Vossen. Ceometry and the Imagination. Chelsea, NY, 
1956. 

[4] T. Leung and J. Malik. Detecting, localizing and grouping repeated scene elements 
from an image. In Proc. ECCV, LNCS 1064, pages 546-555. Springer- Verlag, 1996. 

[5] J. Liu, J. Mundy, and A. Zisserman. Grouping and structure recovery for images 
of objects with finite rotational symmetry. In Proc. Asian Conf. on Computer 
Vision, volume I, pages 379-382, 1995. 

[6] D. G. Lowe. Perceptual Organization and Visual Recognition. Kluwer Academic 
Publishers, 1985. 

[7] J. Mundy and A. Zisserman. Repeated structures: Image correspondence con- 
straints and ambiguity of 3D reconstruction. In J. Mundy, A. Zisserman, and 
D. Forsyth, editors. Applications of invariance in computer vision, pages 89-106. 
Springer- Verlag, 1994. 

[8] J. Ponce, D. Ghelberg, and W. B. Mann. Invariant properties of straight homo- 
geneous generalized cylinders and their contours. IEEE T-PAMI, ll(9):951-966, 
1989. 

[9] C. Rothwell, D. Forsyth, A. Zisserman, and J. Mundy. Extracting projective 
structure from single perspective views of 3D point sets. In Proc. ICCV, pages 
573-582, 1993. 

[10] J. Semple and G. Kneebone. Algebraic Projective Geometry. Oxford University 
Press, 1979. 




Geometric Grouping of Repeated Elements within Images 181 



[11] G. E. Springer. Geometry and Analysis of Projective Spaces. Freeman, 1964. 

[12] P. H. S. Torr and A. Zisserman. Robust computation and parameterization of 
multiple view relations. In Proe. ICCV, pages 727-732, January 1998. 

[13] L. Van Gool, T. Moons, and M. Proesmans. Groups, fixed sets, symmetries and 
invariants, part i. Technical Report KUL/ESAT/MI2/9426, Katholieke Univer- 
siteit Leuven, ESAT/MI2, 1994. 

[14] L. Van Gool, T. Moons, and D. Ungureanu. Affine / photometric invariants for 
planar intensity patterns. In Proc. ECCV, pages 642-651. Springer- Verlag, 1995. 

[15] M. Zerroug and R. Nevatia. Prom an intensity image to 3-d segmented descrip- 
tions. In J. Ponce, A. Zisserman, and M. Hebert, editors. Object Representation 
in Computer Vision, LNGS 1144, pages 11-24. Springer- Verlag, 1996. 

[16] A. Zisserman, J. Mundy, D. Forsyth, J. Liu, N. Pillow, G. Rothwell, and S. Utcke. 
Glass-based grouping in perspective images. In Proc. ICCV, 1995. 




Constrained Symmetry for Change Detection 



up . u w n n o L. Mun y* 



G.E. Corporate Research and Development 
1 Research Circle 
Niskayuna, NY 12309 



Abstract. The automation of imagery analysis processes leads to the 
need to detect change between pairs of aerial reconnaissance images. Ap- 
proximate camera models are available for these images, accurate up to 
a translation, and these are augmented with further constraints relat- 
ing to the task of monitoring vehicles. Horizontal, bilateral, Euclidean 
symmetry is used as a generic object model by which segmented curves 
are grouped, first in a 2-d approximation, and then in 3-d, resulting in a 
sparse 3-d Euclidean reconstruction of a symmetric object from a single 
view. The method is applied to sample images of parked aircraft. 



1 Introduction 



1.1 Change Detection and Aerial Surveillance 



Mu h 


0 


ou 


wo k 


n 


g un n 


ng 


n 


1 


1 




V 


n 


y 


h 


n 




0 


U 0 




h 


P 0 


0 1 


g 


y 


n ly 




h n 






n 


n 


V 




1 


ul 


0 


h g 


ow h 


n g y 


ou 


0 


h 


n 


11 g n 


0 




un 


y 


0 




n 


w 


h po 


ol w 


ono 


. h 


n 


w 


h 


0 


V 


1 


1 


0 


n 


ly 




on 


nu 


0 g ow 


u h 






h 


n 


0 n 






P 


' 0 


u 


V y 


h 


on 


n h 


g ow h 0 


h n 


11 


g n 


wo k 0 




h 




wo 



onfl 


ng p 




u 


h V 


g n 






V 


1 


n 


0 p 0 


u V y 


ool 0 




g y 


n ly 




























h 


u 


n 


1 V 1 0 un 


n 


ng 0 


0 


pu 


V on 


11 


0 


ng 


1 0 


pi 




h 


n ly . 


n 




w 


h V 




P 


0 


n y 


n 


0 


h n 


ly 


wo k 


wh h 




0 




h n 


n 




n 1 0 


U 0 


on. 


Ou 


C 


’ g 


n 




0 


P 0 


u 


V 


y 


ool wh h w 


11 How 


h n ly 


0 


on n 


on 


h 


k 


h n 


0 


wh 




n h 


yl 0 


n 0 


p 0 u 


V y 


u 


ho 




k 


wh h 


n 




U 0 






houl 


U 


0 


V 


1 X 


pi 


h 


lo 




on 0 




w 


h n 


n 




g • 


h 


g 0 


w 




y 0 


whol 




y 


n h 


n 


ly 


w 


h 


0 


X 


n on 


P 


u 1 ng 


h n 


h 


on 






1 ffo 


nvolv 


n 


ply n 


ng h 


g u 


1 ng 


0 



* This work was supported by DARPA contract F33615-94-C-1021, monitored by 
Wright Patterson Airforce Base, Dayton, OH. The views and conclusions contained 
in this document are those of the authors and should not be interpreted as repre- 
senting the official policies, either expressed or implied, of the Defense Advanced 
Research Projects Agency, the United States Covernment, or General Electric. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 182-195, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




Constrained Symmetry for Change Detection 183 



h n ly n 


V n 


gn 0 


WO 


1 k on p 0 


u ng 


po . 


P 0 


u 


V 


y ool 


u ng 


pi p 


1 


0 


1 0 




h g on 


0 h 


y 0 


n 


w 


h 


n 0 


0 1 


n qu 


kly lo 




n 


0 


n h u 1 ng n p 


n only 


h 


gon 


0 h u 


























h 


un on 1 y 




ly 


n 


ho no og ph 


n 0 




on 


y 


( ) 


n 


gl n 


0 


no pp 


0 nvolv 


ny 


g un 




n ng. 


OW V 


on u 


h u y 






0 pp 


n h 


wh 1 


h 


g 


y 


on 


0 p ov 


phy 


iiy 


V 






0 1 


h 


0 1 


1 


0 


X 


. 0 


X pi 


wh n 


h on 








11 0 h gh 1 u 




h 




h 


V y long 0 


1 1 ng h (know 


olloqu lly 


“ h 


0 


w Mo 


1”) 


n 


Igh 


0 n ngul 


1 




on w 11 ul 


n 1 g 


n 1 


on on 


h g 


oun . 


hu h 




g V n 


w 


h ] 


ti 


g y 


u u lly off y 


n 1 


on ] 


a 


u n 


0 


n 


0 






h n 1 


on n 0 


0 


ng 


h 


3 


wo 1 


n 0 X 


1 gn 


n w 


h 


h 




g • 














h 


p 0 1 


on 


0 


g og 


ph 1 g 


: on. 


n h 




nology 


0 h 


0 pu 


V on 


0 un 


y ’ 


h 


known 




1 


on 




n 


ng h 




0 1 


0 h 




g 


n 0 


h g og 


ph 


1 00 


n 




y 


hu w 


u 


1 


0 


P glo 


11 u 


long u 


n 


1 V 


on 


n 0 


g 


00 n 


■ g 0 






p 


on 0 h 


g on 0 


on 0 


n 


h 


glo 1 


00 n 


y 


n 


h : 


n 


p 


oj n 


0 h 


g • 


ow V 


w 


P 0 


V w 


h goo 




0 


h 


a 


priori o h p 


0 1 




n 


nly 


olv 


n 0 




w 


h 




n u 0 


on 


1 on 


X u 1 


n ' 


0 


on. 


h 


ny po 


1 




0 


U 0 


g 


xplo 


on on 


n 


g 




g 


w 


h 


3 


wo 1 0 1. 


h V 


on n 


on 


0 on 


n ly 


k wh h 


0 






n y n 


po 


h h 


ng wh 


h 0 


u 


1 y. h 




y 




n 0 u 


nu 


0 


u h 


on 


X 




h ng 


0 


ow 


V 


h 




h ng 


0 


h 


no 0 


' j 


] 


P 


0 qu 


u 


3 


0 


1 


0 


h 0 j 


0 


on 0 




n 


ly w h V 


n xplo ng 


h u 


0 


g n 




p on 0 0 j 


n n 


h 


P 


p w 


how how on 


u h c 


1 1 


y 




y n 


u 


0 p 0 


u 


u 




0 j 



p h ng on. 



1.2 Symmetry as a Generic Object Model 



y 




y h long n 


ogn z 




w 11 


n n u V ly 




1 g n 


0 


1 1 


p V V 


n g 


y 


u 


y 1 0 j 




0 h 


lly 


n 


yn lly 


0 1 


. 0 


h 


on V n n u 


lo j 


u h 


flow 




n xh 


h gh 


g 


0 y 


y. h V 


xplo 


h 


on 0 un on n 


y 


n 


0 h 


n n n 


u 1 


n 4 . 


n 


g y n ly 


h 


ny 


1 V 


n 0 j wh h 


xh 


ong 


1 


1 


y y- Mo 


V h 1 


n lu 


ng 


n ny 


n 


u 


u 




y n 


3 pi 


n p 


p n 


ul 0 h g oun pi 


n . u n 


h 


0 


n w 1 0 h V 


h 




0 1 


n h on 


on 


n n 


1 


h 


g oup ng 0 h 


y 


0 


j n V n 0 p 


3 


u 1 n 



on u on o ngl v w. 




184 Rupert W. Curwen and Joe L. Mundy 







0 wo k u h h 


0 


0 


hw 


11 1 


on 


h 






po 


1 0 




on 


u 




1 iiy y 






0 


j 


up 0 p oj 


V 


y- 


OW V 


h 


wo 


k 


0 


no 


1 w h h 


0 


pi 


X 


0 


1 wo 1 


g 


w 


h 


lu 


n 


0 


lu 


on. 




h h on 1 






on 


n w 


1 


0 


u 


0 


iiy 






u 


h 


1 iiy y 




0 j 


w hou h n 


0 p ov 




1 


g 0 






p on . 


























n 


X 


pi w w 11 


u 


y 




y 


0 h 


p 


n 


0 




n 0 


n 






n 


g V n V w. 


On 


u h 


V w 


hown n 


gu 


1. 


0 


h 


h 


qu 


1 


y 0 


h 


g no 


goo 




n 


h 


p X 1 olu 


on 


low 


0 


' P 


0 


h 




Z 0 


h 


0 j . n h 


X 


pi 


h 




oun 


hun 




p X 1 


n 


1 ng h. 




























Fig. 1. n X pi 

p n o n 



V n h n ly houl po wh h h 



n ou op on 1 k yp 1 u 



gh 



1. h op o ( g n ly ) 
oonoohp no 

2. h n o ng g o h 
h y o lo li 3 

oh y 1 V 

3. h g on g n n 

ong X o y y. 

4. h ong y y x 

o p o g n o 

op o h g n 

p n . o h n h 



P 

on 
g on 
n 
h 



u 

3 



3 g on o 

n no 

n h g . h 
v2.0 o 
g n u V 

o g oup h u V 
u n o h 
1 o n wh 

n qu u o h 



wh h h y w h 

on wh h How 
yp 1 o g 

h o n h 

n h g n 
o j . h 
h n 
op 



g 



pi 



O O V w. 





Constrained Symmetry for Change Detection 185 



2 Recovering the Symmetry 



2.1 Related Work 





V ou 


pp 0 h 


0 g oup ng u ng 


y 


y w 








on 


h “ 


k w 


y 




y” 0 1 


k w 


y 


y 


y 


g 


n 






y 


k ng 


pi 


n 


0 j 


w h 1 


1 y 


y n 


pply ng n 0 




hog 


ph 


P oj 


on. 




n 


non g n 


p p V 


p oj on 


1 0 


P 0 


u 






k w 


y 


y- 




0 


on pp 0 


h 0 u 


0 n p 


op 




0 


h 




h p 


0 


on 


n h 


y 




y p 


n h n 


pply 0 




V lu 


on 0 




y 


yqu 1 


y ov 


11 


How 


p 


0 h y 


y 


0 


ov 


h 








P 


on. 






no 


h pp 0 


h 12 


h h p 






0 


V 




0 


n 


V 


n w 




g 


y kng li 


L n h pow 


0 h 


V 


0 . 


h 


n 


h 


u h 


V 


g 


hown o h V 


n wh h 


h n 




on 


h 


o 


g 


n o 


off 




o ng 


0 


how 


ny y 


h 0 


j xh 




. h 


n ly 




How 


h nu 


0 


X 


n V n u 


lly h 0 


n on 


0 




OV 
















Mo 


n ly 


ny u ho 


11 3 


h 


V 


kl 




h 


P 


0 1 


0 




ng 


; u h 


n 0 


nog oup ng u ng 


n 


0 






h 


0 j 


1 


. 0 


X 


pi 


h n 


poll 3 


lo 


1 


n 


y 








n 


V lu 


h 




1 y 


long h 


U V 


1 n 




on. 


y 








wh 


h 


1 


long X 


n 


on 0 u V 


1 


1 


n h 


n 


ho 




wh 


h 


V y 


■lo 1 


z 


n 


h 1 


0 X 


ly on 


n 


h 


n 


0 




. 0 X 


pi 


P 


0 


U V 


wh h 


g n 0 


ngl 


ul 


0 no 


g V 


un qu 


X 0 



y y- 



2.2 Unconstrained Problem 

nv g h un on n ov y o pi n y u v g n 



un 


P oj 


V y 


n 


1 


wo 


k 


4 . Ou 


pp 0 h 


. w 




on 


n 


ng 


P 


n on 0 


h 


u 


V wh 


h 


w 


nv n 


0 h 


y 




y n 


0 




0 


P oj 


V y 


n u 


1 


p 


n on 


h 


0 


nil 


on 


po n 


on 


h u 


V . ow 


V 


h 


OV 


y 0 


u h po n 


on u 


V 


hghly 


n 


^ 0 


no 


h ng 


n 




on 






0 


1 n 


h 


1 


0 




ho 


n wh 


h h u 


V 


g 


n 




p 


z 


y u 


ng 


h 


ng n 




wo 


nfl 


on po n 


. h 


u 


V hu 


p 


z 


y 




h 


V 


lly 


w h 


ny 


n n 


0 


0 


h 




u 


V g V n 


ju h 


0 


pon n 


0 


h 


P 0 


nfl on 


po n . 


h 


n 




n 0 


y h n 






V ly 


n 


0 


olv 


0 1 




0 


P oj 


V 




ho wh h 


opolog 


lly 


qu V 


1 n 


0 h 


lo 


n 


n 


0 




















On 


p 0 1 


w h 


h 


PP 


0 


h 


h u V 


p n 


on. 


nfl 


on po n 


y no X 0 


X 


n 




on 


0 


u V n 




ul 


0 


ov 


0 U ly. 


h 


V p 0 




0 


xp 




n 


u ng 


non lo 


1 


n 


nv 


n 


U V 


P 


n on 




on 


h 






pp ox 


on 1 . 


n 


h 


0 gn 


1 0 





u V pp ox y polygon wh h polygon v ho n o 1 

long h u V . h pp ox on g n u v ly y 1 ng h 

g h po n on h u v wh huh oh polygon 1 pp ox 



on. 




186 Rupert W. Curwen and Joe L. Mundy 



h n 1 pp ox on ju h 1 n h ough h u v n po n . h 
pp ox on y o pu v y p ly n g v goo v u 1 h 
whh uv. gono uv wh h hghuvu no pi 

o n ly n 1 ng h h n ho w h low u v u . 

2.3 Invariance of the Ramer Approximation 

n g n ng h pp ox on w no h o ny g n o u v h 
po n p k h n X polygon v x ho wh h h p p n ul 

n o hlnjonngh uv npon x 1. hponwll 

uhh h ngn oh uv p lllohlnjonngh npon. 



u p 


11 lln 


n p 11 1 un 


n n 


no 0 h po n 


P k 


on n 


n n 


0 U V 


W 11 0 


pon 0 


h po n 


on h 


0 g n 1 


u V . 


















1 0 u 


p y h 


n 


ng on 


on 0 h pp 


ox 


on. L 


^0 


h 


w n h 


u V : 


n h 1 n 


jo n ng n 


po n . 


L An 


h 


w 


n h u V 


n 


pp ox 


on wh n h 


n 


p n n 


h pp ox on. h n 


n h 


pp ox 


on wh n 11 


low 


on 


n V lu . 


n h 0 


0 


1 0 


n nv n 


h 


n on 


on 


on gu 


n 0 g V 


n 


n nv n 


1 pp ox on. 




n 


h p n 


0 0 lu on h 


p n 


on w 11 no 


nv 


n n 


h n 


po n 


n h 1 n 


g n 


u h 


pp ox 


on. 


ow V 


w h 


V h 0 


u w 


h h 


ho n 


ov ng y 




u h 


Ing 


n u 


fly wng 


hown 


n gu 2 


. h jo 1 


on 0 h 


PP 0 


h h 


0 no 


1 


hop 


0 h ng nil on po n . 


h 


n 


0 wh 


y h u 


0 lo 1 


u V p op 


u h 


olou 


n 


X u . n 


gu 2 


0 


h lo 1 


olou p op 


0 pu 


long 


h on 0 u V n 


h pp 


ox on 


n h u 


0 


on n 


h 


h. 














u 


h h 


ho n 


only 


ov 1 


0 y 




n lu ng 


n 


n 0 p oj 


V no 


on 0 pi n 


u V . w 


xplo 


h u 


0 


on 1 on 


n 0 olv 0 


3 1 


1 y y- 







2.4 Constrained Problem 

noupol o nh ovyoy y uh pi yh 

wo on n V 1 1 . h known n h y y u o 

ou pi n p p n ul o h g oun pi n . u h h g oun pi n 
x ly h 3 g on o n p ov y h n ly . 
h on n How uopoj h g konohg oun pi n . 
y u V wh h pp ox ly n h g oun pi n w 11 h n o 

2 ul nl ly yonhg oun pi n . h llu n 

gu 3. 

ho no u ppv .hpojonpo y 

olv ng nu lly o h n on o h y w h h 3 g oun 

pi n . 




Constrained Symmetry for Change Detection 187 




Fig. 2. pi n y 

h hu h floo 1 
g n oh 
y y n o 



y OV 
o 

g . n wh 

OV h 



u ng h o 
o ly. n 1 k 

h g 

n ppl 



pp ox on. 
h o g n 1 g 
h ong n 



u h o h V ng OV 
pi n w n u h on n 
p p n ul o h g oun o 
h g . 



h 1 ly yxonhg oun 

h hSplno 1 ly y 

ho 3 onuonohuv n 



2.5 Recovering the 2-d Symmetry Axis 



g 


on 


w 


p 0 n h 


g u 


ng 


h 


g n on Igo 


h 


0 0 


hw 11 1 


n 


gh 1 n w 




0 h 




h In 


w 


h n 


P oj 


on 0 


h 


g oun pi n 0 


OV p 


P 




/ k w n 


h op 


100 


long 


In L 


1 


0 u h p 0 


ng. 


h g 


V 


n pp ox 


1 


1 


u 1 


n y 




y n 2 hown 


n gu 


3. 


h 


y y h 


only 


wo 


g 


0 


0 


ng h p 


0 


h 2 




X 0 y 


y on 


h 


pi n . 


ough 


P 


hn qu w u 


0 n 


h 




ong X 0 


y 


y- 





188 Rupert W. Curwen and Joe L. Mundy 




b) 

Fig. 3. ong In n 
h g oun pi n u ng 
u 1 n 1 1 y 




y 



long h y on o 

h y o n pp ox 





on 


P 


0 1 n g n li n I 2 




hown n gu 


4. 


h wo 


1 n 


g 


0 


h 0 h un h y 


y 


woul h V n 


X 


0 y 




y g V n 


y h 


ng 1 n 6. h k w 


0 n 


u 


h h 


P oj 


on 0 


h n 


I 2 on 0 6 ov Ip. 0 ng w 


P 


0 n 2 


1 n 


ough 


P 


wh 


h oc 


in n h p 0 


pon 


0 h ngl 


n 


1 ng h 



o hppn ul o hognohln. huho zon 1 1 n w p 
n y po n long on x n 1 n h ough h o g n w p n 
y po n long h o h x . 

On VO w u ul o 11 p o 1 n h ough p w 

00 h nhnno Iz oh uno uonoln np 

g V un o ough p . h n non x 1 upp on w p o 
n h op 20 po 1 x w x 

hpu V X w hn nk o ngoh ollngholn 
xpl n y hy y. huo hlnlGih fl on I w 




Constrained Symmetry for Change Detection 189 



b 




>'■ \ 

/ \ 



Fig. 4. hp o gin VO oh nglntipov h h 

poj ono hlnonoh nglnovlp .pndx 

oun n 11 1 n wh h w oil n w h I w ov o L. h 

oil n In w h n p oj on o Z n h u o h p oj 1 ng h 
w lul .h u knov 11 In nhg oup L w h o o 
pu V X o y y. 

2.6 Grouping by 2-d Symmetry 

n ngonohh2y y pow ul g oup ng h n . 

gu on h ov yo y y x o oufl g 130 

. h glnhognl g n n o noy 

u h ong xoy y 11 o nh u wh h h 

un h y y ho wh h y on o un on 1 n y. 

h ng n ng n po on o h o 1 w ov 

3 Recovering 3-d Shape from Symmetry 

On 2 xoy yh nx only 11 p o 

ov h3 hpoh uu.hxoy y ov n2x 
p oj on o 1 n X wh hi onh uplno 1 ly yP 
n 3 h 1 n ng o unknown h gh ov h g oun pi n . 1 o P 
u o ppnul ohg oun pi n . h wo on n h 

pi n P o on n on 1 p n 1 o pi n wh h p o h p n 1 




190 Rupert W. Curwen and Joe L. Mundy 




0 pon o h po on X long h h y . h ly o pi n 

p p n ul o li g oun n w p ow h X w p o 

li g oun pi n ow h . n li y o j xp 

01 whnh gonon gvnyh nly hpnlopln o 

y y o on 1 o oun 





0 


g V 


n pi 


n 


0 y 






y 0 


h p n 


1 n h 


known 


0 1 


h 


P pol 


g 0 




y 0 


h 


0 


j 


ully 


n . 


gu 


llu 


h 


0 




on 0 


h 


P pol 


u 


V 


. po 


in u on 


g n 




u V 0 


pon 


w 


h 


y 


0 


h 






R. 


h 3 


g n 


ng po n 


U 


y 


ny po n 


on 


R. 


h 


u h 


pc 


t n 


n 




11 


n h 


pi n 0 


y 


y P 


n h n 


P oj 


u 


ng 


h 


V u 


1 




R 


wh h 


h fl 


on 


0 h 


R 


n 


P. 


hu 


h po n 


u 


P 


0 


h 


g un 


R P 0 ho 


gn 1 


y R. h 


u 


V 


h 


P pol 


1 n . 


0 


P P 


V 






gh 1 


n wh h 



p h ough h V n h ng po n o h y y. 



li p pol go y on n li po 1 o pon n un g v n 
y y- o po n wn oh ogn uvnh g 

only ho po n wh h 1 on h p pol In y o pon . hi h 



u 


n 0 


1 h 0 pon n 


only on o 


h po n 


oun 


long 


h 


p pol 1 n 


0 h po n n u 


V n u 


n h 




ul pi 


po n 


long h 


p pol In. ny wo 


u V g n 


wh h 0 


y h 


P pol 


on 


n y 


u 0 on u 


3 u V V 


n h y 


no 


h u 


0 


pon n 


. u h on n 


h n qu 










n 


ng 0 no h wh 1 


0 11 n oil 


14 13 


oun 


h 


U V 


ng n 


1 0 h p pol 1 n 


h n h 0 


pon ng 


U V 


U 1 0 




ng n 1 


h 0 no nv 1 


h p V ou 


n . 


ov 


h 3 



u V How olnh on oh y lyoo pon 

ng3 uv n gn o nyp o2 uv gn wh ho y 

h p pol on n . ow v n lyouv gn wx 

lu u h o pon n o on on h on n ng only on ho 

h wh h g V un qu 3 u v . 





Constrained Symmetry for Change Detection 191 




Fig. 6. 


h 


p pol go 




y 0 


g V n 1 


1 u 1 n y y n 3 . 


ngl 


2 


po n on 


h 


1 


y n g n 1 


p 0 ny 2 po n u on h 


gh 


h 


po on 0 


h 


3 g n 


ng po n 


U ov long h y 


R. h 


u 


h pp ng 




llu 


h . 





On 


P 


0 po n 


h 


n n 


0 pon 


ng 


h 3 


u V y 


on 


u 


. 0 


P 


p V 


1 y n 


u 


10 9 


how h 


lo 


0 


olu on 


X 


0 n ng h 3 


po n wh 


h 


n z 


h 0 



n h g . n w o no ou olu on o p p v w 

ply n z h 2 o u ng n nu 1 n z on ho 

h k ng o k u h onv gnw oynhh ul 

ng o w 11 nough. o o u ho woul o 1 n z ou 

on hlolp p V oln hn olv u ng 1 y u 

ng n y o onv g n . 





u h p pol 


g 0 


y 0 on 


u 11 po 


13 u V u 


ng 0 


h xp 


n 


h h 


X 0 y 


y ^ 1 


on h g oun pi 


n . 


h 


u p on w 


11 


ul n 


0 0 


on 0 h 3 


on u on 


h 


n u 


0 wh h p n 


on h 


0 


llu n 


h h gh 0 ho 


g 


n 1 2 


y y 


ov h g 


oun pi n 


. h ff 


w gno n ( 


3U 


xp 


n n w 


11 




nun 


wo k. 








pi 


h 


u V n 


0 h po n oun 11 


. 0 pon ng u 


V 


po n 


n h g 


n 


on X lu ng ho 


0 pon n 


wh h g V lo 


iiy 


n 


n 3 


U V 


. On 


11 u h po 


1 0 pon n h n 


1 


ul 


w h n 




11 


u V po n 


wh h h 


ny w ly p 




po 1 


0 pon 


n 


. h 


n ng 


u V g n 


0 pon n w 




h n u 


0 g n 




3 


u V . h 


U V 


hown n gu 


0 


h 


hown 


n 


gu 1. 


0 h 


h g on 0 


h 0 1 ng n 


1 


0 h 


p pol 1 n 


0 


h y 


y w 


no ov 


h w g on 


0 



u V wh h ul n ly o 3 olu on . ow v h n po n o 




192 Rupert W. Curwen and Joe L. Mundy 



h 


w 


ng 


n 




ipi 


n 


w 




ov 


w 


h 


on 


0 


u 1 g 


1 


ng 


up 


0 


h 


Ipl n 




h 




n po on 0 h 


u 


1 g w 


n 




u ly 


OV 






u 


0 


h 


op 


n 


h 


h 


n 


0 on 




u h 2 




u 


V w 


u 


iiy 


0 


lu 


ng 


on 


on 


n 




no 


0 pon 


0 


h 3 




u 


V . u h 


0 


n 


h 


g 


on 


oun 


h 


ng n 


0 h 




h w 






ny po 1 


0 




pon 


n 


0 




h 


g 




h 


ul n 


lu 


0 3 


u 


V 


ju ng 


ow 




h 








ov 


h 




on 


0 


h 


















Fig. 7. oy og oh onu 3 uv.oh 

nponohwngn Ipln.h vluvgn 

no o pon n n h g on oun h ng n oh wh 



0 “j 


” 0 u V 


ov h no 


w p ng ow 


h 


on 


h 


0 h 


pi n 0 y 


y- 















on 




on u 


on 0 


h j n 


gu 


hown 


n gu 


9. 




h 




qu 


lo 0 


h pi 


n 0 h y 




y n 0 


n h 


3 


lo on 0 


u V po n 


wo 


u 


h on u 


on 


11 qu 


1 . 


h 


h 


1 


ownw w 


p 0 h 


Ipl 


n w 11 p u 


h 


h p 


g 


0 h 


u 


1 


g n h 


0 kp 


n h 


h no . h 




1 0 


pon n 


n h 
3 






oun h 


0 kp 


g n 


ng ul pi 


U V 


ov h 




n 




p 




h 


n w 




u n ly nv 


g 


ng u h 


on 


n 


wh h 




y 


ppl 


g V n 


h 


u V n 3 


0 


n 0 


0 




pon 


n 




, On po 


1 pp 0 


h 


0 P 0 




oun 0 


pon 


ng 







Constrained Symmetry for Change Detection 193 




Fig. 8. h j gwh2 xoy yri3 uupoj 









C. 












Fig. 9. on X pi how ng h 3 on u on p oj on k n o 
hognl gn oy og oh onu 3uv. 

h oun h o kp h v 1 1 o pon n u h h 1 

w p o h Ipl n w 11 ov h p o h w ng . 




194 Rupert W. Curwen and Joe L. Mundy 



u V n How only un qu o pon n o huvpon. n 

ov g n wh hgvn nn3 uv y ung ghln 

w n h n po n . ho o pon n wh h xpl n h g 
Inghuvwllhn hon h o n p on. 

4 Further Work 





0 h 


ng 






on p 


P 


V 


h 


3 


U V 


n 




on u 




0 




ngl V 


w I 




h 


0 1 


0 




n 




g ) 


n 


h n J 


) oj 


n 


0 


n w 


V 


w ( h 


u 


n 




g )• 


h 




uppo 


0 


h 


P oj 




U V 


h 


n h 


0 


j 


11 


h 




no 


h n 


h 


0 j 




h 


ov 




u 


n ly u 




2 


V 


on 0 


h 


Igo 


h w 


h n 


h 


0 




y 




n wh 


h 


h g 


n 


on 




u 


0 


1 


on 


. h g oun 


pi 


n 


n 


0 


w 


P oj 


n 0 


h 


n w 




g n 




h 




h 


on 


u 


on 


0 


y 




y w 


11 


How u 


0 


ppiy 


h 


PP 


0 


h 


0 3 


y 




1 


0 j 






on 


1 V 


w n 


1 


0 


u 


on 


g 




0 


n 


h 


3 


0 


1 0 


h 


0 j 




g wh 


h 


0 


lu ng 


on 


on 




no 


u 


3 


u 


V 


n 


n 


1 


n 


0 


u 


0 


g 


n 


u 






p n 


on 0 


h 


0 j 


















h ff 




o 


h 


3 lo 


on o 


h 


2 


X 


o 


y 


y X w 


gno 


n 


h xp 




n 


. Mov ng h 


X 


long h 






y 


w 11 


ul 


n 




ff n pi 


n 


o 


y 


y 


n 3 




n 


hu 


n o 




o 


on 


o h 3 




on 




u on. 


1 


on 


0 h 


0 


pi n 


0 


h 


P 


n 1 0 


pi 


n y 




0 



nv g 

5 Acknowledgements 

onv onwh un h lyw nflu n 1 u ng h 

o V g o h p p . 

References 

[1] H. Blum and R. Nagel. Shape description using weighted symmetric axis features. 
Pattern Recognition, 10:167-180, 1978. 

[2] Bernard Buxton and Roberto Cipolla, editors. Lecture Notes in Computer Science, 
volume 1064, Cambridge, UK, April 1996. Springer- Verlag. 

[3] T-J. Cham and R. Cipolla. Symmetry detection though local skewed symmetries. 
Image and Vision Computing, 13(5):439-450, 1995. 

[4] R.W. Curwen and J.L. Mundy. Grouping planar projective symmetries. In Image 
Understanding Workshop, pages 595-606, 1997. 

[5] R.W. Curwen, C.V. Stewart, and J.L. Mundy. Recognition of plane projective 
symmetry. In International Conference on Computer Vision, pages 1115-1122, 
1998. 

[6] O. Firschein and T.M. Strat. Radius: Image understanding for imagery intelli- 
gence. In Morgan Kaufmann, 1997. 

[7] M. Fleck, D. Forsyth, and C. Bregler. Finding naked people. In Buxton and 
Cipolla [2], pages 593-602. 




Constrained Symmetry for Change Detection 195 



[8] S.A. Friedberg. Finding axis of skewed symmetry. Computer Vision Graphics and 
Image Processing, 34(2):138-155, May 1986. 

[9] R.I. Hartley and P. Sturm. Triangulation. In Image Understanding Workshop, 
pages 11:957-966, 1994. 

[10] R.I. Hartley and P. Sturm. Triangulation. Computer Vision and Image Under- 
standing, 68(2): 146-157, November 1997. 

[11] T. Leung and J. Malik. Detecting, localizing and grouping repeated scene elements 
from an image. In Buxton and Cipolla [2], pages 546-555. 

[12] G. Marola. On the detection of the axes of symmetry of symmetric and almost 
symmetric planar images. IEEE Trans. Pattern Analysis and Machine Intelli- 
gence, 11(1):104-108, January 1989. 

[13] J. Porrill and S. Pollard. Curve matching and stereo calibration. In British 
Machine Vision Conference, pages 37-42, 1990. 

[14] J. Porrill and S. Pollard. Curve matching and stereo calibration. Image and 
Vision Computing, 9:45-50, 1991. 

[15] U. Ramer. An iterative procedure for the polygonal approximation of plane curves. 
Computer Graphics and Image Processing, 1:244-256, 1972. 

[16] C.A. Rothwell. Object recognition through invariant indexing. In Oxford Univer- 
sity Press, 1995. 

[17] C.A. Rothwell, J.L. Mundy, W. Hoffman, and V.D. Nguyen. Driving vision by 
topology. In Symposium on Computer Vision, pages 395-400, 1995. 




Grouping Based on Coupled Diffusion Maps 



M r roesm ns^ n Lu n ool^ ^ 

niv. o L uv n 1 ium 
- L K wi z rl n 



Abstract, ys ms o oupl non-lin r iffusion qu ions r pro- 
pos s ompu ion 1 ool or roupin . roupin sks r ivi 
in o wo 1 ss s lo 1 n ilo 1 n or h pro o ypi 1 s 
o qu ions is pr s n . is shown how iff r n us n us 

or roupin iv n h s wo lu prin s plus u -sp i sp i lis ions, 

sul s r shown or in nsi y x ur ori n ion s r o isp ri y op i- 
1 flow mirror symm ry nr ul r x ur s. h propos qu ions 

r p r i ul rly w 11 sui or p r 11 1 impl m n ions, h y Iso show 

som in r s in n lo i s wi h si r hi ur 1 h r ris i s o h 
or X. 



1 Introduction 

ision, more th n ny other sensory in ut, en les us to o e with the v ri lity 
of our surroun ings The erform n e of iologi 1 vision systems is unriv lie 
y ny om uter vision system One t sk they re rti ul rly etter t is so- 
lle ‘grou ing’ This is the ru i 1 ste of i entifying segments th t h ve 
high h n e of elonging together rou ing ts s kin of short ut etween 
low-level fe tures n high-level s ene inter ret tions, qui kly ssem ling rel- 
V nt rts nfortun tely, little is known out the un erlying ‘ om ut tion 1’ 
rin i les 

n this er, n ttem t is m e to formul te set of working rin i les th t 
seem to un erly iologi 1 vision These re is usse in se tion 3 Then, in 
se tions 4 n 5 om ut tion 1 fr mework is ro ose th t seems t to turn 

these rin i les into om uter vision Igorithms th t work on re 1 im ges This 
is n im ort nt restri tion, th t rules out et ile mo eling of the tu 1 neur 1 
ro esses The hum n visu 1 system owes t le st rt of its ower to the huge 
mount of ro essing units n onne tions The result is not f ithful o y of 
iologi 1 visu 1 ro essing ut mo el th t exhi its useful fun tion 1 n logies 
ut first, se tion 2 looks t the im li tions of efining grou ing s re un n y 
re u tion 

2 Grouping as Redundancy Reduction 

er e tion n grou ing in rti ul r h ve often een reg re s ro esses of 
re un n y re u tion t tisti 1 inform tion theory h s een invoke to su ly 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 196-213, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




roupm 



s on oupl iffusion ps 



19 



0 je live fun tions to e o timise y these ro esses, su h s h nnon’s entro y 

n mutu 1 inform tion (for re ent ex m le see hilli s n inger 4 

et, st tisti 1 inform tion theory oes not t kle the ore ro lem, whi h is 
to efine re un n y in the first 1 e onsi er the following strings of its 
‘0 0 00 00 0 0 00 ’n ‘0000000000 0’ n terms of h nnon 

entro y, othyiel the s me ‘re un n y’ There re 50 O’s n 50 ’sin oth 
ses The se on string efinitely seems more or ere , i e seems to ont in more 
re un n y, however e ling with su h i eren es is the re Im of Igorithmi 
inform tion theory Of ourse, one oul ssign new o es to irs of su sequent 
its, like ‘00’-‘0 ’-‘ 0’-‘ ’, in whi h se the se on it string woul su enly 

seem very or ere from the view oint of st tisti 1 inform tion theory s well 
ut this requires to ro ri tely re o e the string 

Igorithmi inform tion theory is re isely e ling with this ro lem im ly 

ut, it tries to fin the shortest om uter rogr m (whi h mounts to it 

string th t ro u es the given it string t woul not e e sy to fin rogr m 
th t ro u es the first it string n th t woul e shorter or the se on string 
sim le repeat ‘10’ eleven times woul suffi e The longer su h regul r string, 
the gre ter the g in in its woul e The re 1 om lexity of the t is efine 
s the length of the shortest rogr m th t gener tes them en e, the se on 
string is of less om lexity th n the first 

The i e 1 grou ing evi e woul solve this ro lem from Igorithmi inform tion 
theory u h evi e woul ome u with n or ering rin i le (the rogr m th t 
m xim lly om resses n there y ‘ex 1 ins’ the t Igorithmi inform tion 
theory st tes th t only very sm 11 fr tion of strings will How n re i le 
egree of om ression 3 This un er ins the ‘non- i ent Iness’ i e in th t 
r n om t h ve very low h n e of showing re i le or er 8, 9 

nfortun tely, Igorithmi inform tion theory Iso shows th t in gener 1 m x- 

im 1 re un n y re u tion nnot e hieve noting from h itin 3 ‘The 

re ognition ro lem for minim 1 es ri tions is, in gener 1, unsolv le, n 

r ti 1 in u tion m hine will h ve to use heuristi metho s’ 

One might rgue th t im ge t re f r from r n om n th t re i le 

egrees of re un n y re u tion re ossi le right w y, e g se on orre- 

1 te fe ture v lues t ne r y ixels egments n qui kly emerge th t w y n 

higher or er grou ings oul e forme se on these initi 1 results One oul 

n on further ttem ts on e the se r h for further or er gets ifh ult n 
hen e the ro lem of h ving to e 1 with r n om t is not n issue everthe- 

less, Imost 11 grou ing he nomen in hum n vision h ve een emonstr te 

to work with stimuli like r n om ot tterns en e, the hum n visu 1 system 
oesn’t seem to onstr in the in ut t it n h n le, ut r ther the regul ri- 

ties it n fin n ee , intro u ing heuristi s in the se r h for or er is the only 

ro h th t n revent the system from erforming exh ustive se r h n 

thus e oming in e t ly slow Thus, we h ve to e t the i e of ly- 

ing restri te 1 ss of regul rity ete ting s hemes n letting other ty es of 
re un n y go y unnoti e 




19 



r 



ro sm ns n Lu 



n ool 



n ex m le of how Iso hum n vision exhi its su h eh viour is given y sym- 
metry ete tion ee fig Mirror symmetry in r n om ot tterns is known to 
e very s lient owever, if sm 11 region roun the symmetry xis is re 1 e 
y non-symmetri 1 r n om ot ttern, the over 11 im ression of symmetry 
is seriously we kene Ithough the level of re un n y h s een e re se only 

slightly y this sim le h nge, the re un n y is mu h h r er to ete t 6 




(a) (b) 

Fig. 1. (a) asymmetric random dot pattern; (b) symmetric random dot pattern; 
actually (a) is symmetrical except for a vertical strip around the symmetry axis. 



Igorithmi inform tion theory gives justifi tion to tr ition of listing se - 
r te grou ing rin i les known from intros e tion n er e tion rese r h x- 
m les re the ‘ r”gn nz’ rules of the est Itists 23 n Lowe’s non- i ent 1 
ro erties 9 rou ing metho s shoul fo us on me h nisms with m xim 1 
e e t, voi ing rolifer tion of grou ing rules Then om ut tion time n 
resour es re well s ent mul ting the grou ing rin i les th t n ture lies 

seems soun str tegy to th t en The next se tion e Is with the gui elines 

one might extr t from the r in’s r hite ture n eh viour 

3 Computational Grouping Principles 

This se tion tries to r w on fusions from the est grou ing systems known 
to te iologi 1 vision systems t fo uses on m ro- ro erties of visu 1 ro- 

essing, s we re intereste in om ut tion 1 rin i les r ther th n their tu 1 

im lement tion in the r in 

Massive, but structured parallelism euro hysiologi 1 o serv tions of m m- 
m li n vision systems show th t neur 1 stru tures erform m ssively r llel 
kin of ro essing Moreover, the neuron 1 su str te is not like sou of millions 
of these sm 11 ro essors rom the retin , over the 1 ter 1 geni ul te nu leus 




roupm 



s on oupl iffusion ps 



199 



(L , u to the stri te, restri te, n further orti 1 re s, one fin s high 
egree of org nis tion eurons work in r llel within 1 yers, whi h re simul- 
t neously tive n together form mo ules n re s, whi h in turn jointly 
re te the visu 1 er e t This fine-gr in s ti 1 r llelism in om in tion with 

fun tion 1 r llelism Iso seem rerequisite for the f st res onses of hum n 
vision 

Local Interconnections r llelism oes not im ly highly onne te networks 
n the se of the r in, there re n estim te 0^® syn ti onne tions, for n 
estim te tot 1 of 0^^ neurons This le ves us with n ver ge of out 0,000 
onne tions for e h neuron Then Iso t ke into ount th t there ty i lly 
re multi le onne tions etween the s me two neurons n the i ture is one of 
s rse r ther th n ense onne tivity n summ ry, neuron’s onne tions re 
1 i very rsimoniously, with n em h sis on lo 1 onne tions in retinoto i 

sense ( n within 1 yer n re Iso in hysi 1 sense for th t re son 

Data Driven Processing The res onses of hum n vision to r n om ot t 
suggest there is strong t - riven s e t to e rly vision u h t n yiel 

strong im ressions of symmetry, oherent motion, n e th Treism n 9, 20 
rrie out ex eriments with rtifi i 1 stimuli, where sim le ues like olour, 
orient tion, et were sufH ient to let evi ting om onents o out imme i tely 
t is h r to see how su h erform n e oul e ue to the extensive use of 
ex e t tions out the worl r from suggesting th t ottom-u n to - own 
ro esses ought not oil or te, there nevertheless is goo re son to elieve th t 
the initi 1 visu 1 st ges e en rim rily on the retin 1 in ut 

Specialized modules oth neuro hysiology n hsy ho hysi s in i te the ex- 
isten e of more or less in e en ent h nnels for ro essing of i erent s e ts 
of vision ( olour, orient tion, motion, e th, 2 The existen e of s e i 1- 
ize visu 1 mo ules n re s in the ortex is well-est lishe y now 25 This 
s e i lis tion of re s is ke u y the sele tive onne tions etween them 
rom this one n on lu e th t the visu 1 inform tion is ro esse in r llel 
in i erent re s or mo ules therein, n th t s e i 1 ttention is i to e h 
of num er of si ues 

Bi-directional coupling euro hysiologi 1 terms like ‘visu 1 thw ys’ suggest 

mo el with ‘lower’ re s fee ing into ‘higher’ re s ut s eki 24 uts it 
“most onne tions in the ortex re re i ro 1, n in the visu 1 ortex there is 

no known ex e tion to this rule” u h re i ro 1 onne tions form the n tom- 

i 1 sis for fee k n n Iso hel to rrive t nee tive integr tion of 
inform tion ( ue integr tion t fusion 

Non-Linearity M ny of the onvolution m sks use in om uter vision h ve their 
ounter rt in the r in, so it seems owever. Ire y quite e rly in the visu 1 
ortex neurons re foun with m rke ly non-line r res onse to the intensity 

n s e tr 1 ttern in their re e tive fiel s om lex n hy er om lex ells 




200 



r 



ro sm ns n Lu 



n ool 



re well-known ex m les Iso in re singly one e me w re of influen es th t 
stimuli in wi er surroun n h ve e ts like illusory ontours Iso suggest 
non-line r ro essing n 11 the more so s they my e r or is e r u on 
sm 11 h nges to the stimulus 

Explicit representation of boundaries si fe tures su h s lumin n e, olour, 
motion, e th, et n 11 gener te the im ression of is ontinuities th t se - 
r te regions homogeneous in th t ue 2 The resen e of orient tion sensitive 
ells in re s th t re s e i lise in the ete tion of i erent ues 24 n Iso 

e inter rete s in i tions th t ontour me h nisms o er te in e h of these 

si fe ture m s se r tely t seems there is istin tion to e m e etween 

t le st two ty es of ro esses those th t ete t homogeneity of some sort within 

im ge segments n those th t ete t their oun ries u ker 26 referre to 

these ro esses s Ty e n Ty e ro esses, res , n ross erg 4 lie 

them fe ture n oun ry systems n the se of the lumin n e ue, the r in 
s the ity to form illusory ontours 22 

Local and Bilocal grouping One n istinguish two ty es of ete t le or er 
One orres on s to regions with homogeneous lo 1 h r teristi s, su h s lu- 
min n e, olour, or texture orient tion The se on ty e om ines ues t two 

i erent lo tions Mirror symmetry ete tion is goo se in oint t requires 

the ete tion of simil r ues t osition pairs imil r ro esses seem lie for in 

the se of 1 ss tterns 2 The visu 1 system seems to extr t is 1 ement 
fiel s (ve tor fiel s Motion n stereo re ues where simil r ro ess woul 

e useful, with the ition of h ving the two lo tions eflne t i erent mo- 

ments in time or in the in ut from two eyes The rti ul r relev n e of oth 
lo 1 n ilo 1 ro esses oul Iso e refle te in the rti ul r im ort n e 

of st n 2n -or er st tisti s in ulesz’s texture segment tion ex eriments t 
is Iso interesting to s e ul te out the lo 1 ilo 1 ivi e s n Item tive 

for the ‘wh t’ n ‘where’ thw ys, res 

Maximal usage of a limited number of grouping mechanisms e r te, s e i lise 
mo ules o not im ly th t the un erlying im lement tion 1 me h nisms re 
om letely i erent rom n evolution ry oint of view one oul ex e t th t 
su essful om ut tion 1 s heme is u li te to solve other t sks There re 
goo n tomi 1 in i tions for this 7 , em h sise y the term ‘iso ortex’ 

This suggests th t the i erent ty es of grou ing exhi ite y the r in n 

e im lemente s v ri tions of few si lue rints Of ourse, v ri tions 

etween re s re to e ex e te s result of s e i lis tion 

Compliant regularity detection M xim 1 us ge of grou ing rin i les Iso im lies 
some egree of flexi ility rou ing metho s shoul toler te evi tions from the 
i e 1 regul rities, in or er to e useful in n tur 1 environments r ly ny 
of the o serve , re 1-worl regul rities re of n 11-or-nothing n ture um n 

o servers re e g le of ete ting mil ly skewe symmetry in r n om ot 

tterns 




roupm 



s on oupl iffusion ps 



201 



n on lusion, the im of this work w s to re te grou ing fr mework or ing 
to the following gui elines 

— How fine-gr in r llelism, 

— with m inly lo 1 inter onne tions etween retinoto i lly org nise no es 

— The ro esses re rim rily t riven, se on restri te num er of 

si im ge ues 

— ro essing shoul e rrie out y i erent, s e i lize fe ture m s, 

— in lu ing non-line r ro essing, 

— i ire tion 1 on ling etween m s, n 

— ex li it re resent tions of oth fe ture homegeneity within segments n 
V ri tions th t sign 1 oun ries 

— rou ing ro esses will e of two m in ty es - lo 1 or ilo 1 - n 

— will e se s mu h s ossi le on ut few om ut tion 1 lue rints 

— They will Iso How for evi tions from i e 1 regul rities 

n the next two se tions, two lue rints for lo In ilo 1 grou ing re e- 
s ri e or e h, s e i lis tions tow r s s e ifi grou ing li tions re is- 
usse n results on re 1 im ges re shown 

4 Local Grouping 

ro ly the most str ightforw r ex m le of lo 1 grou ing ro ess is the 
ete tion of regions of roxim tely onst nt intensity This first ex m le is Iso 
sele te e use it est shows the rel tion of the ro ose grou ing fr mework 

with regul ris tion se te hniques n nisotro i i usion, of whi h it oul 

e onsi ere om in tion 

The origin 1 intensity ( of the i erent im ge oints with oor in tes ( 
will e h nge into new v lues ( su h th t sm 11 v ri tions re su resse 
The fun tion ( will e referre to s the intensity map imult neously, 
discontinuity map ( is onstru te oth ro esses re governe y non- 
line r i usion equ tion n they evolve while influen ing e bother through 
i ire tion 1 onne tions 

These equ tions h ve een erive s follows egul ris tion fun tion Is re 

the oint of e rture They ty i lly im ose solution th t shoul strike 

1 n e etween smoothness n f ithfulness to the in ut-sign 1 ( in our se 
The more so histi te regul ris tion s hemes Iso in lu e is ontinuities ut 
en lise for their re tion in or er to voi over-fr gment tion u h fun tion Is 
ty i lly re non- onvex, whi h m kes them ifh ult to extremize The most 
stu ie of these fun tion Is is 2, 

i B = I ( ? + /3( - 2 +u\B\ . 

n the im ge region oun ries B n e intro u e t is rete lo tions 

The first term ushes for n th t is smooth, while the se on will kee it from 




202 



r 



ro sm ns n Lu 



n ool 



rifting f r from the origin 1 intensity here oun ries re intro u e , these 
terms re ut to zero The thir term en lises the intro u tion of oun ry 

ixel This fun tion 1 regul rises n B simult neously 

There re t le st two ro lems with su h fun tion Is The first h s Ire y 
een ointe out n is the ifh ulty of fin ing the o timum se on , m jor 

ro lem re the unex e te hr teristi s th t the o timum m y h ve n the 

se of this rti ul r fun tion 1 e ges will Iw ys interse t s tri les in verti es 
with 20 etween them This is neither esir le nor intuitively le r 

There is not mu h one n o out the se on ro lem if one works vi fun - 
tion Is th t res ri e glo 1 eh viour ere we work through lo 1 res ri tions 

of grou ing eh viour The first ro lem n e re u e y going vi rel te , 

onvex ro lems The gr u te non- onvexity te hnique ( tures the 

solution s limit se of series of onvex roxim tions 2 m rosio n 

Tortorelli intro u e is ontinuity in i tor whi h is kin of smoothe 
version of B They re 1 e the term zz|i?| y 




The first term is smoothing term n the se on term th t tries to kee 
oun ries lo lise h h 8 ro ose to re 1 e this single fun tion 1 y ir 

of fun tion Is, th t re minimise together The origin 1, ifh ult o timis tion 
ro lem is re 1 e y the solution of system of i usion equ tions 

2_2 _ ( _ -=P~" -- + 2 ( . 

t ^ t p 

These evolution equ tions re 1 ul te for 11 im ge ixels t e h iter tion, 

only inform tion from neighouring ixels is nee e The first equ tion strikes 

1 n e etween smoothing (first term n kee ing it lose to the initi 1 
intensities (se on term The se on equ tion governs the oun ry strength 
g in, is smoothe (first term , ke t sm 11 (se on term unless the lo 1 
intensity gr ient is 1 rge (thir term ere ges will e ulle tow r s , 
elsewhere tow r s 0 ti lly, it v ries smoothly etween these extreme v lues 
ote th t influen es ut not vi e vers 
r w k is th t oth the intensity m n the is ontinuity m re lurre 
This ro lem n e llevi te y re 1 ing the line r i usion o er tors y the 
nisotro i i usion o er tor of eron n M lik 3 n the first equ tion, 

this is iv( — , with e re sing fun tion of ||— || or n the se on 

equ tion simil r h nge is m e, with fun tion th t e re ses with — — 



The mo ul tion f tors -0 n n e i erent fun tions of They n e 
hosen ^ n = , there y st ying lose to h h’s equ tions f one 

refers stronger e ge ete tion n less v ri tions within regions ip = n 
= — is etter hoi e 6 The e e t of the mo ilie equ tions is shown in 




roupm 



s on oupl iffusion ps 



203 




a b c 



Fig. 2. a: Part of a SPOT satellite image (of an agricultural area), b: Intensity 
map (f) = constant, {= — ). c: Discontinuity map. 



fig 2 The origin 1 im ge is shown in ( , the -m fter 30 iter tions in ( , 

n the -m in ( , The noise in the intensity v lues h s een re u e , while 
e ges h ve een sh r ene The -m h s v lues lose to ( right regions in 
( ne r the oun ries of regions with homogeneous intensity 
The si ue use in the revious ex m le w s intensity The orient tion of 
lo 1 texture is nother si ue s orient tion is efine mo ulo tt ( ontr st 
ol rity is is r e i eren es n etter e me sure s sin2( — 7 

se on i eren e with intensity is th t orient tion h s to e estim te from 
the out uts of whole nk of oriente filters with is rete series of referre 
orient tions The mo ifie equ tions re 

— = ( ( sin (2 - ^ Sq sin (2 - ( + 

— = P ^ “7 + 2 ( - sin(2 

n these equ tions st n s for the out ut of the filter with referre orient tion 
There re filters, with orient tions tt rt oth equ tions h ve the 

s me stru ture s efore or inst n e, in the first one term im oses ontinuity 
on the fe ture t h n , the se on tries to kee the estim te orient tion lose 
to th t suggeste y the r w filter t 6 
n ex m le of orient tion se grou ing is shown in fig 3 igure 3( shows 

mi ros o e im ge of met Hi stru ture The texture onsists of sever 1 su rts 

with homogeneous orient tion ix filters were use , with referre orient tions 
30 rt The en st te of the orient tion m (first equ tion is shown in 
fig 3( The orient tion is o e s intensity The i erent segments re le rly 

visi le ig 3( shows the en st te of the is ontinuity m (se on equ tion 
right rts re where oun ry strength is high ( lose to s in the hum n 
visu 1 system, thee e t of om iningtheout uts of the filters y these equ tions 
is th t orient tion n e etermine with etter re ision th n their orient tion 
i eren es might suggest (so- lie “hy er uity” 

um n er e tion of intensity shows num er of e ul rities, su h s M h 

n e e ts ( er e t of over- n un ershoots of intensity ne r oun ries 





20 



r 



ro sm ns n Lu 



n ool 




Fig. 3. (a) texture of a slice of metallic material, as seen through a microscope; 
(b) result for the orientation map, with orientation coded as intensities; (c) result 
for the discontinuity map. 



n illusory ontours ( er e t of e ges where they re om letely sent e 
h ve not trie to mo el the 1 tter in this grou ing fr mework yet The M h 
neet neroueif Iso higher eriv tives of intensity re use in the 
equ tions 6 

5 Bilocal Grouping 

5.1 The Bilocal Blueprint 

ilo 1 grou ing ro esses su h s the extr tion of motion ve tors, stereo is- 
rities n symmetri oint irs involve om ining oints t two i erent 
ositions, 1 eit m y e t i erent inst n es or in i erent im ges These oint 
irs re onstr ine to h ve simil r v lues for si fe ture, su h s intensity 
or orient tion ere only intensity will e use for the illustr tion of the i erent 
ilo 1 ro esses 

n or er to fix i e s, we resent ilo 1 ro ess for o ti 1 flow The equ tions 
of orn hun k 5 serve s our oint of e rture The o ti 1 flow ve tors 
{u re foun s the solution of 

— =^u- { .u+ . + — =2- { .u+ . + ( 

with intensity n su s ri ts in i ting rti 1 eriv tives ro lem with 
this system is th t the line r i usion o er tors for e the is 1 ement flel to 
lur motion oun ries urthermore, it is only e e tive in gui ing the se r h 
for o ti 1 flow in s f r s the lo 1 intensity roflle v ries line rly within 
neigh ourhoo of imensions om r le to the motion ist n e oth these 
ro lems re t kle y ting the equ tions 
The first ro lem n e llevi te if the i usion o er tors re re 1 e y 
nisotro i i usion The se on ro lem n e h n le re isely y swit hing 
to ilo 1 r ther th n ’s lo 1 formul tion The un erlying i e n e siest 




roupm 



s on oupl iffusion ps 



20 



e ex 1 ine for the se u ose we re intereste in the motion for oint 
with surroun ing intensity rofile s shown in fig 4( or ing to the o ti 1 




Dual Scheme 




Fig. A. a) The assumption that the intensity profile varies linearly may cause big 
errors for larger motions, b) Schematic overview of the proposed bilocal scheme. 



flow onstr int th t use u = —{ — ( — This is orre t over infinitely 

short time 1 ses, ut evi tions n e severe with im ges re t ken t vi eo 
r te or slower The ro lem lies in the f t th t the intensity rofile is not 
line r over the ist n es tr vele y the oints etween the two im ges The 
ro lem oul e re u e if we h goo roxim tion of the motion 
simil r o ti 1 flow onstr int n then e formul te for the resi u 1 motion, 
whi h is mu h sm Her n for whi h the intensity rofile therefore h s higher 
h n e of o eying the line rity on ition n 5 the following, m them ti 1 
reformul tion of the o ti 1 flow onstr int is erive 

u+ + {uo 0=0 



where 

( t — ( — uq t — Q t t — t 

[uo 0 = — Uo — 0 H 

n with {uq 0 the roxim tion for the motion ve tor The result is 

ilo 1 ex ression th t nee s inform tion t ( in the se on im ge n t 
{ —Uo t — 0 t in the first The ro e ure onsists of su essively u ting 
the is 1 ement estim tes n using them to om ute the s ti 1 (or tern o- 
r 1 gr ients t shifte lo tion whi h h nges with the 1 test is 1 ement 
estim tes 

There re i erent w ys to o t in the initi 1 guess for the is 1 ements One 
is to first ly the tr ition 1 equ tions ven if the motion ve tors re 
im re ise t 1 es, here n there the estim te motions will e lose to the re 1 
ones These s ots flow the ro ess to lo k on to the orre t motion fiel n the 
orre t solution will s re through the i usion ro ess e on ly, multi-s le 





20 



r 



ro sm ns n Lu 



n ool 



te hniques n hel to qui kly ri ge 1 rger ist n es Thir ly, there Iw ys is 
the ossi ility to ki kst rt the system from n initi 1, r n om hel g in, 1 es 
where the motion h ens to e lose to the re 1 solution n sufli c to ootstr 
the whole ro ess 



rt from the nisotro i i usion n the ilo 1 o ti 1 flow onstr int, 
thir i eren e with the system is n ition 1 oun ry ro ess 5 
re ting su h m is more intri te for the ilo 1 ro esses e r oun - 
ries, rts of the kgroun get o lu e or e ome visi le These re s h ve 

no orres on ing oints in the other im ge en e, in one of the im ges the is- 
1 ement hel is un efine This re tes n interesting symmetry etween the 
im ges for su h oints, th t oes not exist for oints th t re visi le in oth im- 
ges n ee , one n onsi er two is 1 ement liel s where oints in the first 
im ge go to in the se on n v v onsi er is 1 ement ve tor from the first 

im ge to its orres on ing oint in the se on n then ing the is 1 ement 

ve tor from this 1 tter oint to its orres on ing oint in the first im ge The 
two is 1 ement ve tors will nnihil te e bother, ex e t in oun ry regions 

The is 1 ement ve tor for oint only visi le in one im ge will m ke no sense 

The ve tor in wh tever oint is ssigne to it will not o ey the nnihil tion rule 
The m gnitu e of the sum C of the two ve tors n therefore e use to e- 
te t oun ries n ee , we n let this m gnitu e rive is ontinuity m 
of the ty e th t we use for the lo 1 ro esses n this se, it might e more 

ro ri te to 11 it is re n y m , s om lete regions might o t in high 

V lues s they re only visi le in one of the im ges n f t, there re two su h 

sum ve tors one n st rt either from the first or the se on im ge n t ke 

the orres on ing oint’s ve tor to form the sum n or er to ete to lu e 
n iso lu e regions su h u 1 s heme is ne ess ry is 1 ements from the 
first to the se on im ge n is 1 ements from the se on to the first im ge 
re extr te The over 11 system is s hem ti lly shown in fig 4( t onsists 
of six ou le i usion equ tions, four of whi h es ri e the fe ture m s, i e 
the motion ve tors of the u 1 s heme ( oth forw r n kw r , in i te 
with n h the other two me sure the is re n y from the oint of view of 
e h of the im ges 



u 

T 

u 



( ( u - { 

{{ U - { 

( ( - ( 

( ( - ( 

p 2 -- + 2 ( - 

P 

p 2 --+2 ( - 

P 



U + + (w 0 

u + + {w 0 

U + + {w 0 

U + + {w 0 

C {w w 
C {w w 



(2 




roupm 



s on oupl iffusion ps 



20 



The is re n ies n ( g in one for e h of the u 1 s hemes gui e the 
nisotro i i usion in the m s th t extr t the motion om onents u n 
or et ile ount the re er is referre to 5 

5.2 Bilocal Specialisations 

s mentione e rlier, the ilo 1 ro esses re intro u e to h n le grou ing 
se on ues su h s motion, stereo, n symmetry The ete tion of regul r 
texture is nother ex m le h of these enefit from ert in t tions 
Motion sequen es ty i lly onsist of more th n two im ges etter re ision 
is o t ine y onsi ering fr mes th t re se r te y 1 rger time 1 se 
is 1 ements etween su h fr mes shoul e the sum of fr me-to-fr me is- 
1 ements n hen e goo initi lis tion is v il le om ut tion s ee from 
fr me to fr me n e in re se s well, s motion fiel s will norm lly not h nge 
r sti lly from one fr me to the next The 1 test is 1 ement fiel n serve 
s n ex ellent initi lis tion for the next fr me or the ex m les in this er, 
multiview ro h w s use 

or stereo the two im ges h ve to suffi e f is rities re r ther 1 rge, there 
might e ro lem of initi lis tion multi-s le ro h n llevi te this 
ro lem 0 t 11 s les ilo 1 str tegy is use , however ust lying 
lurring line rises the intensity rofile ut Iso in re ses ro lems of intensities 
getting ‘ ont min te ’ y rts not visi le in the other im ge Iso re ise lo- 
lis tion still enefits from the ossi ility to work lo lly roun two i erent 
ositions in the two im ges further s e i lis tion is ossi le for mer on- 
figur tions yiel ing horizont 1 e i ol r lines n th t se the verti 1 is rity 
om onents n e ut to zero n the system re u es to set of 4 ou le 
equ tions ven if e i ol r lines o not run erfe tly horizont 1, it is often use- 
ful to st rt the system u with only these 4 equ tions tive, to get goo 

initi lis tion more r i ly, n then lug in the ition 1 equ tions for 

or symmetry more fun ment 1 t tion is require , s i e lly ne r y is- 
1 ement ve tors re not the s me ut v ry in s e ifi w y orres on ing 
oints lie i metri lly long symmetry xis This me ns th t the is 1 e- 
ment fiel shows r ther stee gr lent whi h n not e h n le y sim le 
smoothness o er tor urthermore, the tu 1 osition of the xis is not known 

eforeh n , whi h om li tes the se r h ro ess The ilo 1 ro ess n e 

te to look for symmetry of re efine orient tion (or n orient tion lose 
to th t re efine orient tion for th t m tter This might seem r ther lumsy 
solution, ut orient tion e e ts in hum n symmetry ete tion suggest th t Iso 
the r in h s to erform kin of s n over ossi le orient tions The or er of 
this s n h s een the su je t of intensive e te in er e tion rese r h To give 

n i e of how the ilo 1 s heme n e te , verti 1 symmetry mo ule 

is is usse n th t se we only h ve to el with horizont 1 is 1 ement 
The following h nges re m e 

— irst of 11, the smoothing o er tor is re 1 e y iv ( ( (u( — 2 

whi h t kes the esire h nge in the horizont 1 is 1 ement u with into 
ount 




20 r ro sm ns n Lu n ool 

— The horizont 1 gr ient in the equ tions (2 h s to e inverse , sin e 

long the horizont 1 ire tion orres on ing oints h ve mirrore intensity 
rofiles 

— Two o osite is 1 ement fiel s with rel tively short is 1 ements re 
t ken s initi lis tion of the u 1 is 1 ement s heme 

This mo ifie ilo 1 ro ess su essfully lo ks on to the symmetry ne r its xis 
rom there it s re s to re s further from the xis sy ho hysi 1 fin ings with 

hum n symmetry er e tion suggest simil r ro ess t woul ex 1 in why 

isru tion of symmetry ne r the xis h s su h rofoun im t on symmetry 

ete tion The mo el is Iso onsistent with hum n vision in the sense th t 

symmetry nee not e erfe t The xis n e somewh t urve , intensity t 
symmetri ositions oesn’t nee to e i enti 1, et 

or the ete tion of regul r texture the m in i eren e from the si s heme 

lies in the f t th t sever 1 is 1 ement fiel s - e h orres on ing to one ty e 

of erio i ity - n oexist e w nt to fin 11 the i erent, sm llest erio s in 

the ttern This is hieve y initi lising sever 1 onst nt is 1 ement fiel s 

with i erent orient tions n ve tor lengths They will lo k on to the erio i 
stru ture if there is lo 1 greement These m t hes ro g te to other re s s 
the initi 1 is 1 ement fiel s get istorte in or er to follow the v ri tions in 
texture orient tion n s ing ever 1 of the initi lis tion fiel s m y lo k on 
to the s me erio i ity, ut if they s m le i erent orient tions n is 1 e- 
ments ensely enough, every ty e of erio i ity will e ‘ ete te ’ y t le st one 
initi lis tion fiel in ing out whether erio i ity h s een ete te is e sy, 
s the is re n y m s will show low v lues 

5.3 Examples of Bilocal Grouping 

ig 5 shows n ex m le of motion extr tion rs rive with i erent s ee s 
on rossro rt ( shows one of the fr mes of the vi eo in ut rt ( 
shows the m gnitu es of the extr te motion ve tors righter me ns f ster 
The outlines re sh r , ut ue to the moving sh ows on the groun they o 




Fig. 5. a: Frame from a video of a traffic scene, b: Magnitudes of the extracted 
motion vectors. c,d: Discrepancies for each of the dual schemes. 



roupm 



s on oupl iffusion ps 



209 



not re isely oin i e with the outlines of the rs rts ( n ( shows 

the is re n ies for e h of the u 1 s hemes One ( ete ts the rts of the 

ro th t e ome visi le g in, while the other ( shows rts th t re getting 
o lu e n f t, one n oth m s n get the ontours of the rs n 

O o er tion of the m s yiel s the regions th t re not visi le in one of the 
views 

ig 6 gives se on ex m le of motion extr tion ive fr mes of swimming 
fish were t ken s in ut rt ( shows one of the fr mes The velo ities 1 u- 




a b c 



Fig. 6. a: One of 5 frames with a swimming fish, h: Veloeity magnitude when 
motion is only ealeulated between two frames, c: Velocity magnitude when motion 
between the first and the fifth frame. 



1 te for two su sequent fr mes re shown in ( rt ( shows the result for 
the velo ity from the first to the fifth fr me, se on the sum of fr me-to-fr me 

motions s initi 1 is 1 ement fiel s n e seen, the o je t is eline te 

very sh r ly from the kgroun n the velo ity is very homogeneous over 
the fish’s o y 

ig 7 ( shows stereo ir of two m nikins s rt ( shows, the e th is- 

ontinuities re re isely lo lise in the is re n y m s ote th t the k- 

groun is Iso highlighte s yiel ing irrelev nt is rity t This is hieve 

y n ition 1 ‘texture m ’, nother i usion ro ess th t ivi es the s ene 
into texture n untexture rts 7 is rities re su resse for untex- 
ture rts rt ( shows two views of the 3 re onstru tion m e on the 
sis of the extr te is rities The is rities were foun with the hel of 
multi-resolution ro h three-level im ge yr mi w s use 
ig 8 shows n ex m le for f e, whi h is only we kly texture om re to 

the m nikins evertheless, the re onstru tion is re son le g in, is rities 

on the kgroun were su resse through the texture m 
igure 9 shows two s enes with symmetry rt ( shows n -r y of hum n 
thor X The symmetry is not erfe t The 1 es with zero is 1 ement in i te 
the osition of the symmetry xis They re highlighte in the figure The s me 
w s one for the he -shoul er s ene (‘ 1 ire’ in rt ( The ro ess is seen 

to e quite toler nt to evi tions from i e 1 symmetry 




Fig. 7. a: Original stereo image pair, b: Resulting diserepancies. e: Two views 
of the 3D reeonstruction. 



Fig. 8. a: Original stereo image pair of a face, b: Two views of the reconstrueted 





y 













roupin s on oupl iffusion ps 211 




Fig. 9. Examples of symmetry detection; b: X-ray image of a thorax, c: head- 
shoulder scene. 



The ete lion of regul r texture is illustr te in fig 0 rt ( shows the 
in ut im ge The go 1 is to fin the re e te textures of the shirt rt ( 
highlights regions where regul r texture w s foun rker zones se r te the 
i erent ie es of textile where they re knitte together rt ( gives n 

i e of the re ision of the extr te erio i ities sing sh e-from-texture 
ro h lo 1 surf e orient tions re given for oints li ke on y the user 
The orient tions fit our visu 1 ex e t tions 




Fig. 10. a: Original image, b: Segmentation of relevant periodical texture areas, 
c: Estimated surface orientation using shape-from-texture. 



212 



r 



ro sm ns n Lu 



n ool 



n the ilo 1 ro esses es ri e so f r, oint irs re referre with i enti 1 

V lues for the si fe ture (here only intensity hen resente with i erent 

o tions the visu 1 system nee not even refer the solution with i enti 1 v lues, 
however 2 e h ve uilt exten e ilo 1 s hemes th t How intensities to 

h nge etween views (with s ti lly v rying s le + o set is ussion is out 

of the s o e of this er 

6 Conclusions 

n this er we h ve rgue th t grou ing is ne ess rily se on t sk-oriente 

sets of rules, not univers 1 rin i le e h ve then trie to set out gui elines 

for the sele tion n im lement tion of grou ing rules The fr mework th t we 
ro ose is se on the evolution of ou le , non-line r i usion equ tions 
n these systems, e h of the equ tions etermines the evolution of relev nt 
im ge ue, su h s intensity, lo 1 texture orient tion, motion om onents, et 

e ture m s ome with is ontinuity m s of their own, whi h m ke ex li it the 

oun ries of segments th t re homogeneous in those fe tures The m s re 
org nise retinoto i lly n the num er of onne tions e h ixel woul nee 

for su h equ tions to e im lemente on h r w re with fine-gr in r llelism 

n e ke t low 

uture rese r h will e ime t om ining i erent m s into single sys- 
tem ork will e nee e on the onne tions etween ues s n ex m le, 3 

re onstru tion of the f e in fig 8 n e im rove y ex loiting its symmetry 
Acknowledgment: u ort y s rit-LT ‘ m roofs’ is gr tefully knowl- 

e ge 

References 

1 L. m rosio . or or Hi Approximation of functionals depending on jumpls by 

elliptic fnctionals via r -convergence, omm.. ur n ppl. h. vol. 3 pp. 
999-103 1990 

2 . 1 k n . iss rm n Visual Reconstruction r ss 19 

3 . h i in n omn ss n m h m i 1 proo Scientific American ol. 232 

o. pp. - 2 19 

ross r n . o orovi ur 1 yn mi s o 1- n 2- ri h n ss p r- 

p ion uni mo 1 o 1 ssi 1 n r n ph nom n r p ion sy- 
hophysi s 3 2 1-2 19 

orn .K. . n hun k . rminin op i 1 flow. . 17 1 203 19 1. 

nkins un n y in h p r p ion o il r 1 symm ry in op rns 

r p ion sy hophysi s ol. 32 o. 2 pp. 1 1-1 19 2 

. on s rmin nsohyorhi uroh rrlorxh. lin 

Signal and sense, local and global order in perceptual maps s. Im n 11 
n ow n pp. 3- 0 il y-Liss 1990 

. K n ov ry o h 3- im nsion 1 sh p o n o j rom sin 1 vi w 

r i i 1 n Hi n ol.l pp. -11 19 1 

9 . Low r p u 1 r niz ion n isu 1 o ni ion n or niv rsi y 

hni 1 r por - - -1020 19 




roupin s on oupl iffusion ps 213 

10 . rr n . o io h rory o hum ns r opsis ro . oy 1 o . 20 

pp.301-32 19 9. 

11 . um or n . h h o im 1 pproxim ion y pi wis smoo h un ions 

n sso i V ri ion 1 pro 1 ms omm. on ur n ppli h. ol. 2 

pp. - 19 9 

12 . p hom s . Kov s .or n . ul sz uni ppro h o h 

p r p ion o mo ion s r o n s i -flow p rns Behavior Research Methods, 

Instruments, & Computers ol. 2 o. pp. 19- 32 199 

13 . ron n . lik 1 - p n ion sin niso ropi i - 

usion ol.l2 o. uly 1990. 

1 . hillips n .inrnsrho ommon oun ions or or i 1 ompu- 

ion Behavioral and Brain Sciences ol. 20 pp. - 22 199 

1 . ro sm ns L. n ool n . os rlin k rmin ion o op i 1 flow 

n is is on inui i s usin non-lin r iffusion 29 -30 m y 199 

1 . ro sm ns . uw Is n L. n ool oupl om ry- riv n iffusion 

qu ions or low-1 v 1 vision in om ry- riv n iffusion in ompu r ision 
Kluw r mi u lish rs pp.l91 22 199 . 



1 


ro sm ns L. 


n 


ool n 


os rlin k 


roupin hrou h lo 


1 p r- 




11 1 in r ions 




n . ymp. on p i 1 


i n ppl. 0 i i 


1 m 




ro ssin 


oL2 


pp. 


9 199 






1 


. h h m n ion 


y non-lin r 


iffusion. 


1991. 




19 


. r ism n n 


1 


ur 


in r ion 


h ory 0 n ion 


0 ni iv 




sy holo y ol. 12 


pp. 


9 -13 19 0 








20 


. r ism n r 


n iv pro ssin in vision 


31 1 -1 19 




21 


. r ism n . v 


n 


h . is h r 


m h 


n r n n . von 


r y 




orm p r p ion n 




n ion s ri 


or X n 


yon in Visual Perception: 




The Neurophysiological Foundations 


mi r 


ss 1990 




22 


. von r y 




rh ns n 


um r 


n r llusory on ours 


n or- 




i 1 n uron r spons 


s 


i n 0 I .22 


pp.l2 0-12 2 19 




23 


r h im r L ws o 


or niz ion 


in p r pul orms in A source-book of 



Gestalt Psychology . . llis r our r no. pp. 1- 193 

2 . ki un ion 1 sp i lis ion in h visu 1 or x h nr ion o s p r 

ons ru s n h ir mul is in r ion h. in Signal and sense, local and 
global order in perceptual maps s. Im n 11 n ow n pp. -130 il y- 
Liss 1990 

2 . ki j4 Vision of the Brain 1 kw 11 ini u li ions 199 . 

2 . u k r rly pro ss s or ori n ion s 1 ion n roupin in From Pixels 

to Predicates . . n 1 n pp.l 0-200 lx w rs y 19 




Integrating Geometric and Photometric 
Information for Image Retrieval 

Cordelia Schmid^, Andrew Zisserman^, and Roger Mohr^ 

^ INRIA Rh6ne-Alpes,655 av. de I’Europe, 38330 Montbonnot, France 
^ Dept of Engineering Science, 19 Parks Rd, Oxford 0X1 3PJ, UK 



Abstract. We describe two image matching techniques that owe their 
success to a combination of geometric and photometric constraints. In the 
first, images are matched under similarity transformations by using local 
intensity invariants and semi-local geometric constraints. In the second, 
3D curves and lines are matched between images using epipolar geometry 
and local photometric constraints. Both techniques are illustrated on real 
images. 

We show that these two techniques may be combined and are comple- 
mentary for the application of image retrieval from an image database. 
Given a query image, local intensity invariants are used to obtain a set of 
potential candidate matches from the database. This is very efficient as 
it is impiemented as an indexing algorithm. Curve matching is then used 
to obtain a more significant ranking score. It is shown that for correctly 
retrieved images many curves are matched, whilst incorrect candidates 
obtain very low ranking. 



1 Introduction 

The objective of this work is efficient image based matching. Suppose we have 
a large database of images and wish to retrieve images based on a supplied 
’query’ image. The supplied image may be identical to one in the database. 
However, more generally the supplied image may differ both geometrically and 
photometrically from any in the database. For example, the supplied image may 
only be a sub- or super-part of a database image, or be related by a planar 
projective transformation, or the images may be two views of the same scene 
acquired from different viewpoints. 

An example application to keep in mind is the retrieval and matching of 
aerial views of cities. If the supplied image is acquired from a large distance, by a 
satellite for example, then the geometric distortions with respect to the database 
images are planar projective and partial overlap. However, if the supplied image 
is acquired at a distance where motion parallax is significant, by a low flying 
plane for example, then the geometric distortion can not be covered by a planar 
transformation and 3D effects must be taken into account. The illumination 
conditions (sun, clouds etc) may well also differ between the supplied and image 
database images. 



D.A. Forsyth et al. (Eds.): Shape, Contour .... LNCS 1681, pp. 217-233, 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 




218 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 



There are two key ideas explored here. The first is that matching can be 
made more robust by using both geometric and photometric information. This 
is illustrated in two ways: first, in section 2, we describe a method of image 
retrieval based on local interest point descriptors which is invariant to image 
similarity transformations; second, in section 3, we describe a method of curve 
matching between images of 3D scenes acquired from different viewpoints. 

The second idea is that the efficiency of indexing using interest points can be 
supplemented by the verification power of curve matching. This is illustrated in 
section 4 where it is shown that the interest point matcher provides fast access to 
an image database, and that the retrieved images may be ranked by the number 
of matched curves. 

2 Image Retrieval Based on Intensity Invariants 

The key contribution of several recognition systems has been a method of cutting 
down the complexity of matching. For example tree search is used in [2] . In index- 
ing, the feature correspondence and search of the model database are replaced 
by a look-up table mechanism [10]. The major difficulty of these approaches is 
that they are geometry based which implies that they require CAD-like repre- 
sentations such as line groupings or polyhedra. These representations are not 
available for objects such as trees or paintings, and can often be difficult to 
extract even from images of suitable CAD-like objects. 

An alternative approach is to not impose what has to be seen in the image 
(points, lines . . . ) but rather to use the photometric information in the image 
to characterise an object. Previous approaches have used histograms [18] and 
related measures which are less sensitive to illumination changes [5, 11]. 




/=\ 

\=/ 



vector of local 
characteristics 



Fig. 1. Representation of an image. 



The idea reviewed here, which originally appeared in [13, 14], is to use local 
intensity invariants as image descriptors. These descriptors are computed at 
automatically detected interest points (cf. figure 1). Interest points are local 
features with high informational content [15] and enable differentiation between 
many objects. Image retrieval based on the intensity invariants can be structured 
efficiently as an indexing task. 

Experimental results show correct retrieval in the case of partial visibility, 
similarity transformations, extraneous features, and small perspective deforma- 
tions. 





Integrating Geometric and Photometric Information for Image Retrieval 219 



2.1 Interest Points 

Computing image descriptors for each pixel in the image creates too much in- 
formation. Interest points are local features at which the signal changes two- 
dimensionally. In the context of matching, detectors should be repeatable, that 
is a 3D point should be detected independently of changes in the imaging con- 
ditions. A comparison of different detectors under varying conditions [13] has 
shown that most repeatable results are obtained for the detector of Harris [6]. 
The basic idea of this detector is to use the auto-correlation function in order to 
determine locations where the signal changes in two directions. 

Figure 2 shows interest points detected on the same scene under rotation. 
The repeatability rate is 92% which means that 92% of the points detected in 
the first image are detected in the second one. Experiments with images taken 
under different conditions show that the average repeatability rate is about 90%. 
Moreover, 50% repeatability is sufficient for the remaining process if we use 
robust methods. 




Fig. 2. Interest points detected on the same scene under rotation of the world 
plane. The image rotation between the left image and right image is 155 degrees. 
The repeatability rate is 92%. 



2.2 Intensity Invariants 

The neighbourhood of each interest point is described by a vector of local in- 
tensity derivatives. These derivatives are computed stably by convolution with 
Gaussian derivatives. In order to obtain invariance under rigid displacements in 
the image, differential invariants are computed [4, 9]. The invariants used here 
are limited to third order. The vector which contains these invariants is denoted 
V. Among the components of V are the average luminance, the square of the 
gradient magnitude and the Laplacian. 

To deal with scale changes, invariants are inserted into a multi-scale frame- 
work, that is the vector of invariants is computed at several scales [21]. Scale 
quantisation is of course necessary for a multi-scale approach. Experiments have 
shown that matching based on invariants is tolerant to a scale change of 20% [13]. 
We have thus chosen a scale quantisation which ensures that the difference be- 
tween consecutive sizes is less than 20%. 




220 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 



Our characterisation is now invariant to similarity transformations which are 
additionally quasi-invariant to 3D projection [1]. 

2.3 Retrieval Algorithm 

Vector comparison Similarity of two invariant vectors is quantified using the 
Mahalanobis distance o?m- This distance takes into account the different magni- 
tude as well as the covariance matrix A of the components. For two vectors a 
and b, c?M(b,a) = — a)^A^i(b — a). 

In order to obtain accurate results for the distance, it is important to have 
a representative covariance matrix which takes into account signal noise, lu- 
minance variations, as well as imprecision of the interest point location. As a 
theoretical computation seems impossible to derive given realistic hypotheses, it 
is estimated statistically here by tracking interest points in image sequences. 

The Mahalanobis distance is impractical for implementing a fast indexing 
technique. However, a base change makes conversion into the standard Euclidean 
distance ds possible. 

Image database A database contains a set {Mk} of models. Each model Mk 
is defined by the vectors of invariants {Vj} calculated at the interest points of 
the model images. During the storage process, each vector Vj is added to the 
database with a link to the model k for which it has been computed. Formally, 
the simplest database is a table of couples ( Vj, k). 

Voting algorithm Recognition consists of finding the model Mf, which corre- 
sponds to a given query image /, that is the model which is most similar to this 
image. For this image a set of vectors {Vj} is computed which corresponds to the 
extracted interest points. These vectors are then compared to the Vj of the base 
by computing: duiViyVj) = dij If this distance is below a threshold t, 

the corresponding model gets a vote. 

The idea of the voting algorithm is to sum the number of times each model 
is selected. This sum is stored in the vector T{k). The model that is selected 
most often is considered to be the best match : the image represents the model 

for which k = arg max^ T{k). 

Figure 3 shows an example of a vector T{k) in the form of a histogram. Image 
0 is correctly recognised. However, other images have obtained almost equivalent 
scores. 

Multi- dimensional indexing Without indexing the complexity of the voting al- 
gorithm is of the order oi I x N where I is the number of features in the query 
image and N the total number of features in the data base. As N is large (about 
150,000 in our tests) an indexing technique needs to be used. 

Our search structure is a variant of fc-d trees. Each dimension of the space 
is considered sequentially. Access to a value in one dimension is made through 
fixed size 1-dimensional buckets. Corresponding buckets and their neighbours 
can be directly accessed. Accessing neighbours is necessary to take into account 




Integrating Geometric and Photometric Information for Image Retrieval 221 





160 




140 




120 


(/) 


100 


0) 




o 


80 


> 






60 




40 




20 



0 20 40 60 80 100 

model 



Fig. 3. Result of the voting algorithm : the number of votes are displayed for 
each of the 100 model images. Image 0 is recognised correctly. 



uncertainty. The complexity of such an indexing is of the order of 1 (number of 
features of the query image). 

This indexing technique leads to a very efficient recognition. The mean re- 
trieval time for our database containing 1020 objects (see figure 6) is less than 
5 seconds on a UltraSparc 30. 



2.4 Semi-local Constraints 

Having a large number of models or many very similar ones raises the probability 
that a feature will vote for several models. We therefore add the use of local shape 
configurations (see figure 4). 




a database entry and a match 

its p closest features 



Fig. 4. Semi-local constraints : neighbours of the point have to match and angles 
have to correspond. Note that not all neighbours have to be matched correctly. 

For each feature (interest point) in the database, the p closest features in the 
image are selected. If we require that all p closest neighbours are matched cor- 
rectly, we suppose that there is no miss-detection of points. Therefore, we require 
that at least 50% of the neighbours match. In order to increase the recognition 
rate further, geometric constraints are added. As we suppose that the transfor- 
mation can be locally approximated by a similarity transformation, angles and 
length ratios of the semi-local shape configurations have to be consistent. 

An example using the geometrical coherence and the semi-local constraints 
is displayed in figure 5. It gives the votes if semi-local constraints are applied to 





222 Cordelia Schmid, Andrew Zisserman, and Roger Mohr 

the example in figure 3. The score of the object to be recognised is now much 
more distinctive. 



30 

25 

20 
U) 

0) 

o 15 
« ,0 
5 
0 

Fig. 5. Result of applying semi-local constraints : the number of votes are dis- 
played for each model image. Semi-local constraints decrease the probability of 
false votes. Image 0 is recognised much more distinctively than in figure 3. 



I I u I 

0 20 40 60 80 100 

mode 



2.5 Experimental Results 

Experiments have been conducted for an image database containing 1020 im- 
ages. They have shown the robustness of the method to image rotation, scale 
change, small viewpoint variations, partial visibility and extraneous features. 
The obtained recognition rate is above 99% for a variety of test images taken 
under different conditions. 

Content of the database The database includes different kinds of images such 
as 200 paintings, 100 aerial images and 720 images of 3D objects (see figure 6). 
3D objects include the Columbia database. These images are of a wide variety. 
However, some of the painting images and some of the aerial images are very 
similar. This leads to ambiguities which the recognition method is capable of 
dealing with. 




Fig. 6. Some images of the database. The database contains more 1020 images. 










Integrating Geometric and Photometric Information for Image Retrieval 223 



In the case of a planar 2D object, an object is represented by one image in 
the database. This is also the case for nearly planar objects as for aerial images. 
A 3D object has to be represented by images taken from different viewpoints. 
Images are stored in the database with 20 degrees viewpoint changes. 



Recognition results Three examples are now given, one for each type of image. 
For all of them, the image on the right is stored in the database. It is correctly 
retrieved using any of the images on the left. Figure 7 shows recognition of a 
painting image in the case of image rotation and scale change. It also shows that 
correct recognition is possible if only part of an image is given. 




Fig. 7. The image on the right is correctly retrieved using any of the images on 
the left. Images are rotated, scaled and only part of the image is given. 



In figure 8 an example of an aerial image is displayed. It shows correct re- 
trieval in the case of image rotation and if part of an image is used. In the case 
of aerial images we also have to deal with a change in viewpoint and extraneous 
features. Notice that buildings appear differently because viewing angles have 
changed and cars have moved. Figure 9 shows recognition of a 3D object. 




Fig. 8. The image on the right is correctly retrieved using any of the images on 
the left. Images are seen from a different viewpoint (courtesy of Istar). 







224 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 




Fig. 9. On the left the image used for retrieval and on the right the retrieved 
image. Matched interest points are displayed. 

3 Curve Matching 

In this section we review a method for line and curve matching between two per- 
spective images of a 3D scene acquired from different viewpoints. It is assumed 
that 3D effects can not be ignored, and that the fundamental matrix, F, for the 
image pair is available. We return to how the fundamental matrix is obtained 
in section 4, where it is shown that the number of matched curves provides a 
ranking score in image retrieval. 

Previous criteria for stereo curve matching have included epipolar and order- 
ing constraints, figural continuity [12], variation in disparity [23], and consistency 
of curve groups [3, 7]. The method reviewed here, which originally appeared 
in [16, 17], is to supplement such geometric constraints by photometric con- 
straints on the intensity neighbourhood of the curve. In particular the similarity 
of the curves is assessed by cross-correlation of the curve intensity neighbour- 
hoods at corresponding points. This is described in more detail in the following 
section. 

We will describe two algorithms: the first is applicable to nearby views; the 
second to wide baselines where account must be taken of the viewpoint change. 
The algorithms are robust to deficiencies in the curve segment extraction and 
partial occlusion. Experimental results are given for image pairs with varying 
motions between views. 

3.1 Basic Curve Matching Algorithm 

We suppose that we have obtained lines and curves in each image. The task 
is then to determine which lines/curves, if any, match. The problem is non- 
trivial because of the usual problems of fragmentation due to over and under 
segmentation. The algorithm proceeds by computing a pair-wise similarity score 
between each curve (or line) in the first image, and each curve (or line) in the 
second. The matches are decided by a winner takes all scheme based on the 
similarity scores. 




Integrating Geometric and Photometric Information for Image Retrieval 225 



The photometric information is employed in computing the similarity score. 
Consider two possibly corresponding curves c and c' in the first and second 
images respectively. The curves are corresponding if they are images of the same 
3D curve. If they are corresponding, then a point to point correspondence on 
the curves may be determined using the epipolar geometry : for an image point 
X on the curve c, the epipolar line in the second image is Ig = Fx, and this line 
intersects the curve c' in the point x' corresponding to x, i.e. x and x' are images 
of the same 3D point. Consequently, the image intensity neighbourhoods of x 
and x' should be similar. Then the similarity score for c and c' is determined by 
averaging the similarity of neighbourhoods for all corresponding points on the 
curves. The similarity of neighbourhoods is determined by cross-correlation. 

If the curves are indeed corresponding, then the similarity score will be high 
— certainly in general it will be higher than the score for images of two different 
3D curves. This is the basis of the winner takes all allocation of curve matches. 



Matching performance The algorithm is demonstrated here on the two image 
pairs shown in figures 10 and 12. The ground-truth matches are assessed by 
hand. 




frame 11 



frame 15 



frame 19 



Fig. 10. The “bottle” sequence. Frames are selected from this sequence to form 
image pairs. The camera motion between the frames is fairly uniform, so that 
the frame number is a good indicator of the distance between views. 



Figure 11 shows a typical matching result for two bottle images (frame 11 and 
15). Only the parts of the matched contours for which there are corresponding 
edgels in both views are shown. This excludes the parts of the chains along 
epipolar lines (where one-to-one point correspondences are not available), and 
also those parts of the chain which are detected as edgels in one view but not 
in the other. Only corresponding parts are shown for the rest of the examples in 
this paper. 

The performance of the algorithm depends on the number and quality of the 
curves detected in each image. However, as shown in table 1, over a 100% vari- 
ation in the curve segmentation parameters the algorithm performs extremely 
well. The two parameters are the minimum intensity gradient at which edgels 
are included in the linked contour — a high value excludes weak edges; and the 
minimum number of edgels in the linked chain — a high value excludes short 



226 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 




Fig. 11. Short baseline matching for frames 11 and 15 of the bottle sequence. 
Upper pair : The curves which are input to the matching algorithm. The contours 
extracted are with a gradient threshold of 60 and a length threshold of 60 pixels. 
There are 37 and 47 contours in the left and right images respectively. Lower 
pair : The 29 contours matched by the algorithm, showing only the parts which 
have corresponding edgels in both views. 97% of the 29 matches are correct. 



chains. Most of the mismatches may be attributed to specularities on the bottle. 
Curves arising from specularities can be removed by a pre-process. 

Figure 12 shows matched line segments for aerial images of an urban scene. 
248 and 236 line segments are obtained for the left and right images, respectively, 
122 of the lines are matched, and 97.5% of the 122 matches obtained are correct. 

It is evident from these examples that for all choices of the parameters shown 
a large proportion (> 80%) of the potential line/curve matches are successfully 
obtained. 



3.2 Wide Base Line Matching Algorithm 

If there is a significant rotation of the camera or a wide baseline between views, 
then the simple correlation of image intensities employed above will fail as a 
measure of the similarity of the neighbourhoods of corresponding image points 
on the curve. Think of a camera motion consisting of a translation parallel to 
the image x-axis, followed by a 90° rotation about the camera principal axis (i.e. 




Integrating Geometric and Photometric Information for Image Retrieval 227 



min 

intensity 

grad 


min 

curve 

length 


number 

curves 

left 


number 

curves 

right 


number 

curves 

matched 


correct 

matches 


60 


60 


37 


47 


29 


97% 


60 


30 


59 


72 


41 


90% 


30 


30 


85 


85 


41 


85% 



Table 1. Edge detection parameters and curve matching results for the short 
baseline algorithm applied to frames 11 and 15 of the “bottle” sequence. For the 
60/60 case there is one false match, for the 60/30 case there is one false match, 
the other three are due to specularities; and for the 30/30 case there are two 
false matches, the other four are due to specularities. 




Fig. 12. Upper pair: Two aerial views of a building acquired from different view- 
points. Lower pair: Matched segments using the short range motion algorithm. 
97.5% of the 122 matches shown are correct. 








228 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 



a rotation axis perpendicular to the image plane). The cross-correlation of the 
neighbourhoods will be very low if the rotation is not corrected for. 

Suppose the 3D curve lies on a surface, then the rotation, and in general all 
perspective distortion, can be corrected for, if the cross-correlation is computed 
as follows: for each point in the intensity neighbourhood of x in the first image, 
compute the intersection of the back projected ray with the surface, then de- 
termine the image of this intersection in the second view. The surface defines a 
mapping between the neighbourhoods of x and x', and the cross-correlation is 
computed over points related by this map. We don’t know the surface, and so 
don’t know this map, but a very good approximation to the map is provided by 
the homography induced by the tangent plane of the surface at the 3D point of 
interest. 

In the case of line matching [16] this homography can only be determined 
up to a one parameter family because a line in 3D only determines the plane 
inducing the homography up to a one parameter family. This means that for 
lines a one dimensional search over homographies is required. However, in the 
case of curve matching, the curve osculating plane provides a homography that 
may be used, and no search over homographies is required. 




Fig. 13. The osculating plane of a (non-planar) curve varies, but is always de- 
fined in 3-space provided the curvature is not zero. This plane is determined 
uniquely from the image of the curve in two views. The plane induces a homog- 
raphy between the images. 



In more detail suppose a plane curve is imaged in two views, as illustrated in 
figure 13, then given the tangent lines and curvatures at corresponding points, 
X ^ x', of the curves in each image, and the fundamental matrix between 
views; the the homography H induced by the osculating plane may be computed 
uniquely [17]. 

An example of wide baseline matching for frames 11 and 19 of the “bottle” 
sequence (cf. figure 10) is shown in figure 14. Of 37 and 48 curves in the left and 
right images, respectively, 16 are matched, and 14 of these matches are correct. 




Integrating Geometric and Photometric Information for Image Retrieval 229 





Fig. 14. Wide base line matching for frames II and 19 of the bottle sequence. 
88% of the 16 matched contour chains are correct. 



4 Image Matching Using Curve Verification 

It has been shown in section 2 that given a query image, a set of possible match- 
ing images can be retrieved from an image data base. This retrieval is efficient 
because it is based on indexing of interest points invariants. It then remains 
to determine which images in the set of possible matches do indeed match the 
query image, and to rank the matching images. We show in this section that 
curve matching may be used to verify image matches and also provide a ranking. 
These verification tests require a multi-view relation (such as a planar projective 
transformation or fundamental matrix) and the point correspondences provide 
this. 

Suppose the query and database images are views of a 3D scene acquired from 
different points. A first verification test is to determine if the interest point cor- 
respondences satisfy epipolar geometry constraints. This is equivalent to seeing 
if a large proportion of the matches are consistent with a fundamental matrix. 
Robust methods are now well established for simultaneously computing the fun- 
damental matrix and a set of consistent point matches, from a set of putative 
point matches, many of which may be incorrect [19, 20, 24]. It can be a weak test 
as correspondences can accidentally line up with epipolar lines, and if there are 
a limited number of interest points it is always possible to obtain a consistent 
solution for the fundamental matrix. 

If the images pass the first verification test then a fundamental matrix is 
available between the query and database image. A second verification test is 
then to use the line/curve matcher described in section 3, to see if a large pro- 
portion of the lines and curves match. The retrieved images may then be ranked 
by this proportion. A higher number indicating a greater overlap in viewpoints 
of the 3D scene. 

These two verification steps are demonstrated in the following example. For 
the query image on the left of figure 15 there are 11 images in the database with 
more than 7 interest point correspondences (the minimum number required to 




230 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 



compute the fundamental matrix). These images are determined by indexing 
on local intensity invariants and semi-local geometric constraints. The database 
images with the highest and next highest voting scores are shown on the right 
of figure 15 and in figure 16. The match with highest vote is actually correct, 
whilst the other is incorrect. Both images pass the first verification test, so 
a fundamental matrix is available between the query image and each of the 
database images. The curve matcher of seetion 3 is then applied to each image 
pair. In the case of the match with highest vote, 802 curve edgels are matched 
(see flgurelT). In the case of the match with the second highest vote no edgels 
are matched. The correct match is therefore very clearly identified. 

Computing the fundamental matrix as a means of object recognition has 
been proposed before by Xu and Zhang [22] amongst others. However, com- 
bining the fundamental matrix with the additional geometric and photometric 
constraints provided by curves delivers a powerful image matcher: it has the 
indexing advantage of interest points combined with the verification strength of 
curves. 




Fig. 15. Left: the query image. Right: the best match using intensity invariants. 
Interest points used during the matching process are displayed. 



5 Discussion and Extensions 

The interplay between geometric and photometric constraints has been illus- 
trated at a number of points throughout this paper. 

First, it has been shown that under image similarity transformations point 
correspondences can be established between two images simply by employing the 



Integrating Geometric and Photometric Information for Image Retrieval 



231 




Fig. 16. The second best match using intensity invariants - which is in fact 
incorrect. Matched interest points are displayed. This match is rejected by the 
curve verification test. 




Fig. 17. Verification test : Matched edges for the image pair in figure 15. 802 
edgels have been matched. 



232 



Cordelia Schmid, Andrew Zisserman, and Roger Mohr 



discriminance of local intensity patterns. However, blindly voting on individual 
invariants is not sufficient to guarantee the correctness of the image match in 
database indexing. It is crucial to introduce further geometric constraints in the 
form of semi-local coherence on the point patterns. 

Second, it has been shown that although the matching of 3D curves between 
two images would appear to be very ambiguous - since a point on a curve in one 
image could potentially match with any of the points at which its epipolar line 
intersects curves in the other image - the introduction of photometric constraints 
on the curve neighbourhoods virtually eliminates the ambiguity. 

These two techniques have been combined for the application of matching 
images of a 3D scene from a large set of images acquired from different view- 
points. Curves, more than points, capture the structure of the scene, and the 
number of curve matches may be used to rank the image matches. Indeed an 
extension of this technique would be to detect change between the images (such 
as the addition or removal of a building [8]) by the spatial arrangement of the 
unmatched curves and lines. 

In the context of image retrieval 3D effects can often be ignored, as a scene 
may be planar or 3D effects are not significant. The map between images is then 
a simple planar homography (projective transformation). This homography may 
be computed from the interest point correspondences. Curve matching using 
both geometry and photometric information can then proceed in much the same 
manner as that of section 3, with the homography providing the curve point 
correspondences. 



Acknowledgements 

We are grateful for discussions with Joe Mundy. The algorithms in this paper 
were implemented using the lUE/targetjr software packages. Financial support 
for this work was provided by the UK EPSRC lUE Implementation Project 
GR/L05969, and EU Esprit Project IMPACT. Cordelia Schmid has been par- 
tially supported by the HCM program of the European Community. 

References 

[1] T.O. Binford and T.S. Levitt. Quasi-invariants: Theory and exploitation. In 
DARPA Image Understanding Workshop, pages 819-829, 1993. 

[2] R.C. Bolles and R. Horaud. 3DPO : A three-dimensional Part Orientation system. 
The International Journal of Roboties Research, 5(3):3-26, 1986. 

[3] R.C.K. Chung and R. Nevatia. Use of monocular groupings and occlusion analysis 
in a hierarchical stereo system. In Computer Vision and Pattern Recognition, 
pages 50-56, 1991. 

[4] L.M.T. Florack, B. ter Haar Romeny, J.J Koenderink, and M.A. Viergever. Gen- 
eral intensity transformation and differential invariants. Journal of Mathematical 
Imaging and Vision, 4(2): 171-187, 1994. 

[5] B.V. Funt and G.D. Finlayson. Color constant color indexing. Transactions on 
Pattern Analysis and Machine Intelligence, 17(5):522-529, 1995. 




Integrating Geometric and Photometric Information for Image Retrieval 233 



[6] C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision 
Conference, pages 147-151, 1988. 

[7] P. Havaldar and G. Medioni. Segmented shape descriptions from 3-view stereo. 
In International Conference on Computer Vision, pages 102-108, 1995. 

[8] A. Huertas and R. Nevatia. Detecting changes in aerial views of man-made struc- 
tures. In International Conference on Computer Vision, pages 73-80, 1998. 

[9] J.J. Koenderink and A.J. van Doom. Representation of local geometry in the 
visual system. Biological Cybernetics, 55:367-375, 1987. 

[10] Y. Lamdan and H.J. Wolfson. Geometric hashing: a general and efficient model- 
based recognition scheme. In International Conference on Comuter Vision, pages 
238-249, 1988. 

[11] K. Nagao. Recognizing 3D objects using photometric invariant. In International 
Conference on Computer Vision, pages 480-487, 1995. 

[12] S.B. Pollard, J.E.W. Mayhew, and J.P. Frisby. PMF: A stereo correspondence 
algorithm using a disparity gradient constraint. Perception, 14:449-470, 1985. 

[13] C. Schmid. Appariement d’images par invariants locaux de niveaux de gris. These 
de doctoral, Institut National Polytechnique de Grenoble, 1996. 

[14] C. Schmid and R. Mohr. Local grayvalue invariants for image retrieval. Transac- 
tions on Pattern Analysis and Machine Intelligence, 19(5):530-534, May 1997. 

[15] C. Schmid, R. Mohr, and Ch. Bauckhage. Comparing and evaluating interest 
points. In International Conference on Computer Vision, 1998. 

[16] C. Schmid and A. Zisserman. Automatic line matching across views. In Conference 
on Computer Vision and Pattern Recognition, pages 666-671, 1997. 

[17] C. Schmid and A. Zisserman. The geometry and matching of curves in multiple 
views. In European Conference on Computer Vision, 1998. 

[18] M.J. Swain and D.H. Ballard. Color indexing. International Journal of Computer 
Vision, 7(l):ll-32, 1991. 

[19] P. H. S. Torr and D. W. Murray. Outlier detection and motion segmentation. In 
Proc SPIE Sensor Eusion VI, pages 432-443, Boston, September 1993. 

[20] P. H. S. Torr and A. Zisserman. Robust computation and parameterization of 
multiple view relations. In International Conference on Computer Vision, pages 
727-732, 1998. 

[21] A.P. Witkin. Scale-space filtering. In International Joint Conference on Artificial 
Intelligence, pages 1019-1023, 1983. 

[22] G. Xu and Z. Zhang. Epipolar Ceometry in Stereo, Motion and Object Recognition. 
Kluwer Academic Press, 1996. 

[23] Y. Zhang and J.J. Gerbrands. Method for matching general stereo planar curves. 
Image and Vision Computing, 13(8):645-655, October 1995. 

[24] Z. Zhang, R. Deriche, O. Faugeras, and Q.T. Luong. A robust technique for 
matching two uncalibrated images through the recovery of the unknown epipolar 
geometry. Artificial Intelligence, 78:87-119, 1995. 




Towards the Integration of Geometric and 
Appearance-Based Object Recognition 

o Mu y ush r x 

General Electric Corporate 
Research and Development 
Niskayuna, NY, USA 

1 Overview 



rogr ss 


0 j 


r og 0 


h s 


r 


1 


V ly slow ov 


r h 


Is V y rs 


or so. sp 


h 


os r 1 


progr ss 


our 


u rs 


g u 


0 r s r h 


PP r 


■ s 


m ho s 


V r s 




g 


r mo Is 


our 


1 y 0 r og- 


z m -m 




ur lo 


j s 


lu 


r 


s s w h 


ompl 


X Hum 0 


h s 0 s g 




ly r s 














w 


mph 


s s w 11 


ss ry 


0 g 




su s 


1 r 


s r og - 


opr orm 


hr r 


urr ly 


wo m 


hms 


work 1) h or 1 


mo Is 0 0 


j 


PP r 


1 rg ly 


s 


0 


g om ry 


2) 


mp r 1 mo - 


Is r V 


rom 


mgs mpl s s 


0 




r r pr s 


0 


so s y. 


og 0 


sys 


ms s 0 


- 


r V 


g 


om r mo 


Is 2 


s X mpl 


0 h rs 


ppro h 1 


r mo Is 


us 


g P r ] 


r og 


0 m ho s 


.g. L M 


s 


x mpl 


0 h s 


0 


p 


r gm. 


h r 


ppro h lo 


solv h 


pro 


1 m 0 r og 


z g 


0 j 




lu r 


s 


u r ompl X 


Hum 0 




sh ows. 















Our urr orm 1 mo Is o h worl o o ou or h ull ompl x y 
ooj ppr uo ssuhs r-rfl os low r solu o 
ompl X sur rfl vyu os. hsm m so slool- 
1 X mpl so llhsu os wh h oj my vw.v w 
vr Issuh sv wpo Hum o r orqur so hous s 

o s g mp r 1 s mpl go o s. h s o j ’s pp r s 

sg ly r hosrvos owrr lug h 

ppr sp.holusowhohrojssk o ou h 
oil o pro 1 m om s r 1 . 

ss o s y sur s r p os s mu h mor r 1 o y w h 

mo r mgs gm o Igor hms wh h pr s rv mos o h sur opol- 
ogy hr h m g proj o . o xplo h s opology o 

o j sur s h r pp r hrough phys 1 mo Is r hr h 
hrough h gh m so 1 r s - gh or v w sp s. hrough r p v s 

proj V pho ogr mm ry ov r h 1 s v y rs h g om r prop r so 
m g proj o r r so ly w 11-u rs oo . r w k o 

qu s 0 0 sur prop r s h xplo o o h s prop r s 

h r og o pro ss. 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 234—245, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




Towards the Integration of Object Recognition 



235 



h s uss o o ollow prim rysoxprm sr sr o 
llus r h r pr s o s gm o ppro h mpl y h gr - 
o o g om r pho om r r u s. 

2 Edge-Based Regionization 

2.1 Regions Derived from Edgel Boundaries 

h r fl prop r so sur r s xplo rms o r g o s 

y V r ous low-or r mo Iso syvr o.hs gm o 

ppro h s s o h h mo r g ors h v rly 

pr 00 syso us. hrm glo os h 

mg r ou or y r 1 V ly slowly v ry g s y mo 1. r w 
osrl r qur vr os. .pi r qur sy sur 

mo Is. 




y li s ppro hw hvhr ol ury ompl ss o 

sur ou r s wli 1 pro ug v srpooli ror 

syvr o o u orm rgos. hr olgs gm o s show 
gur 1. gur 2 h ou r s gur 1 h v polygo z h 

rgos orm y jo g r ou ry po s us g o s r r gu- 





236 Joe Mundy and Tushar Saxena 



1 o ^ . h r g o s suppor mul ply- o opology o ou 

or so uous syvr oswh Irgrgo. h r gul o 
gsrr hrsgl suppor . 

hs g- s rgoz o ssllqu osrv v s shoul s 

urhrmrg gorgos shoul rr u r sp mo Is or s y 

sur V r o . 




b) 



Fig. 2. a 


s a 




c 


s 


V a 


s s 


c 


s a a 


a 






a $ a 


a c a 






c a s a 




a a 




s a 


c s s ws 


a 


s 


s 


ac 


av c a 


$ 


a s 


a 


s a 




saw 


a 


fl c a c 


va 


a s V 


s ac 







2.2 


Intensity Models 










s 


s r 1 0 orm 1 rg r r g o 


s 


so h 


or 1 mo 1 or 


s y 


V r 


0 . Ov r 11 su h mo Is h 


V 


1 rg ly 


r 0 h L m 


r 


r fl 


u 0 s h s 


gr 


1 m h m 


1 prop r s 


los ly 


1 ks 


sur g om ry w h pp 


r 


ow V r 


h r s 0 s r 1 


s p - 


sm 


w h h V s 0 ommu y 


h 


L m r 


mo 1 s r Is 





hspprws sph ssu or h mom o s r h y 

ro us r ov ry o s y r g o prop r s w 11 o ly pr 1 or low-or r 
mo Is. h s work w os rrprs ghvr oovr rgo s 



^ The authors are grateful to Dani Lischinski of the Hebrew University of Jerusalem 
for providing the triangulation code. 




Towards the Integration of Object Recognition 237 



hrl rorqur .h sh sy sur s mo 1 s hr 

I{x,y) lo + ax + hy ov I{x,y) lo + ax“^ + bxy + cy^ + dx + ey. u h mo Is 

h V pr V ously us h so- 11 ac g o ppro h s o 

u 1 r 1 k . 

hsrprs o osomk y omm m ou h r fl 

mo 1 u ssum s h g v r fl mo 1 pro u s slowly v ry g 

s y or smoo h sur s r 1 v ly u orm Hum o . s 

h vol oso hs ssump o su h s sp ul r h ghl gh s r Ir y p- 

ur rms oh so u soh glou rss show gur 1 

)• 

s X mpl orgoxr o so sy sur g o - 

s r h s mpl s show gur . gur h s y s o o s 




Fig. 3. s a 

s sc s w 

s s ac 



a a s ac s 
as a a a a 



s a 
X a 



ross h pi r 
o or rup ly u 
ompl ly or h 
s r gur 

mos o h r g o 



u V r s slowly u ogr ulvr os Hum 
o sh ows. ow V r pi r mo 1 ou s rly 

syvr osu o Hum o . h s r sul s mo - 

s o surpr s g h low or r mo 1 ou or 

syvr o osuh s mpl s s gur . Iso 




238 Joe Mundy and Tushar Saxena 




Fig. 4. a 

a 

w s 

va a a 



ac s 3 a 

s a s s a 
a ca 

ca s s w 



s ac 
as s 
s s a s 



a a s 

saw a s 
s a s 



h r r m s h ssu o how h s s gm o s r p o woul v ry w h 
V wpo 



2.3 Variation of Intensity Models with Viewpoint 

s o V ws o h rm ssu rs show gur 1 w 11 us o 

llus r h o V wpo V r o o h s y mo 1. h s v ws 

r show gur . hrgoshpvrsrl vly slowly h r s u 1 

s y rror s Iso r 1 v ly sm 11. h s r sul mo s r s h h sh p 

rpr o o sur s p ur h h gh r or r rms. ow v r h 

V r o o h u rly g low or r v r o s r so ly o uous w h 
r sp ov wpo gh hmjorvr o s glo 1. hsv r o 

s show gur 

2.4 Remarks 

h s r sul soss whhxpr ppr - s sys ms 1 k 

LM.hshm vro sy ross v ws s p ur 

low or r u o 1 pprox mos. h soLMhsu osr 
g V ors oh ovr mrxomg syrrys wh h ou h 

o j h V w. r h pprox mos rm rom p x Is wh h 

r r or o r g o ou ry rm y m g so u s. 

rgu h us g r or o s gm - 1 r g o s s mu h mor 

si rprs o oso p ohvg sly pro-1 

kgrou o h o j . h s ppro h h s m ou ry wh h s us o 

g om r orm o ouhoj sus orvh sys- 

r u o . s show h xs o hmgv wpo r v rom 




Towards the Integration of Object Recognition 239 



-20 deg 0 deg 20 deg 



40 deg 



60 deg 



80 deg 1 00 deg 



Fig. 5. a X ac 

a was X a 
s s s a ca 

s a va a s ac 



ass as 
s a was a a 
s 

a a a a 



s s a 

a a a 

s s 







240 Joe Mundy and Tushar Saxena 



h ou ry g om ry hus h syrprs o x 

orropr p vr o. 

3 AfRne Indexing 

V s o r g o s wh h h V smoo hly v ry g s y h x ssu s 
o xgshm sohs prop r s. h ul m go 1 s 

o pro u gr X o g om r s y m sur m s. O 

ppro ho gr o s sugg s y h sp 1 prop r s o mg 

proj o . 




Fig. 7. 3 a a c a a s 



os r gur 7 wh h shows h proj oo - rm(rh- 

ro ) o o m g pi . s ssum h h s proj o s rr ou y 

m r wh r h proj o m r x s o h orm 

til ti2 ti3 ti4 

T t21 t22 t23 t24 
0 0 0 t^4 

r h s proj oil somr Iro Im y 

xpr ss g m g s ur s 2- oor r m . hus h o ly 

vr osooj ppr hsrprs o r u ohgs 




Towards the Integration of Object Recognition 241 

V wpo . or h m r h oor so mg proj o 

vrywhhor oohvwr o.h sh roh mr 

o s r y wh r o ly s pos o o h v w-sph r s h 

proj g om r s ru ur . hus wo gr s o r om sp y h v w g 
o os. 

Ohohrh y-sruur pi ol 

r m OSS go our po s. Iso h r 1 o sh p w y - s ru ur 

s 2- proj osu ygrl rs orm o o 

sp s V ry h g s xpr ss oor so g w h. ollows 

h h wo u k ow prmrsoh mr rovr rom 

h proj m g pos o o h our h po h - o 1 r m . 

hkypo sh h mrsk ow c s c 

how V r s ss ry o s 1 sh our orr spo s w s s or 

o j mo 1 s proj o . h s orr spo s r k ow h h 



rvwo oj X orgohmrv wpo s 

w 11 s h oor so h proj s ru ur . 





h s 


X 


gs h 


m s s m 1 


r 0 


h ^ 


0 


0 s 


X 


P 


h 


mo 


1 




s 


r 


r ly 




S 0 




h k ow 


m 


r V 


wpo 


0 


s 


rr 


ou 




r 


s 


gh or s 


r 


h 


s 


0 1- 


ompos 


0 s 


0 h 




X 


sp 


• y 


0 


1 r 


s 


our 


ppro 


h 


s s m 


1 


r fl vor 0 


V r 




mo 1 




X 


g wh r 


h 




X 


r 


ly 


ss 


1 s s 


or r pr s 




0 0 


1 h 


m g 




ur 


s. 


ow V 


r 


u Ik 


V 


wpo 




V r 




s r p 0 




s 


ss 


ry 0 


qu r 


su 






um 


r 


0 V 


WS 0 


h 


0 j 




0 ou 


or 


h 


V r 


0 


0 


j 


pp 


' r 




u 


r 


m 


r mo 


0 
































0 V lu 




hs 


0 


pos 


gm 




ou 


r s 


s 


r s 0 V 


WS 0 




r 


k 


h Im w r 




qu r 




s gm 




us 


g h 


s h m 


s 


r 






0 


2. 


h 


s gm 




0 s 


r 


show 


gur . 


h 


WO 




or 




0 p 


r 


m 


rs 



r show 


gur 


9. 


h r sul s 


r 


goo gr m 


up 0 


ou 0*^ wh r 


h r g 0 


us 0 


r V 


h 


p r 


m rs s s 


rly 


go. h s X- 


p r m 


shows h 


h 


r s u 


0 


1 r 1 0 sh p 


w 


h r m 


oor 


s m r 


V wpo 


0 


SSU 0 0 


s r 


pro g s 


h h 


oor 




s ao 


ai r 


■ r 1 V ly V 


r 


0 V wpo ov r 


h r g 


-lO'^ 


0°. 


h s slow 


V r 


0 s goo rom 


X g po 0 


V w u 


or 


V r 


g h u 


0 


1 r 1 0 sh p 


0 


rm u 1 


m r pos . h s 


P 


1 y s us 


0 


ppro h 0 


h 


gr 0 0 


s y r 


us h 


X S 0 . 











4 Initial Thoughts about Intensity Indexing 

ssr o2. sslohr rzh syo g- 

s gm r g o w h low or r poly om Is urhrho sohs 

pprox m o s V ry mor or 1 ss o uously w h v wpo . h s o 
h X jus s r h V wpo rlvohoj -rr- 

r m rm rom s gl m g m sur m p o 

s ru ur . 




Fig. 8. 


s 


5 


a 


5 


a 


s 


a 








a 


s 




a 




c 




a 


a s 






a Vi 


as 








c 






w 


V 


c ^ 


ass c a 




5 


s 


C 


a 


s 


w c 


a ca 


a 




ca 


V 




















hus s poss 1 0 


X h 




s y 


g 0 


s or 


g V 


0 


Hum 


0 


r 0 


or 


g 0 


h V 


wpo 


orr spo 


gr 


s gm 


0 




h s V ry pr 1 m 


ry 


V s 


g 0 


h s y 


r 


u 


0 us 


s 




X r 


ly u 




US 


0 V r 


y h su 


SS 0 


n 


r r V 


y 




g om r 


X 


g- 


1 m 


ly h 


ompl ppro 


h V 


r V 


V r 


sur 


r u s 


rom 


h r 


g 0 pprox m os 






0 


llus r 




hs 


h hr 


P 


r m 


r mo 


1 0 Or 




y 


s us 


0 


h 


V r 0 


0 r g 0 


s 


y w h 


0 j or 


0 




xp r m s 


hroughou h 


s p p r 


h 


0 j 


s ro 


0 ur 


1 




r 1 V 


Hum 




0 


m r 


r 


0 s 


r k p 


X . hus 


h 


0 s 


r go 


s 


y V 


r os 


r u 


0 0 h h g s 


Hum 0 




V 


r 0 


r 1 


V 


0 h lo 


1 sur 


orm 1. 










h 


Or - 


- y 


r h ory 


s r V 


rom 


mo 1 or sur r fl 






0 r 


omly 


or 


s. s g 


h s mo 


1 h sur 


rough 


SS 


1] 


0 


s y w 


s jus 


0 g V 


h 


s 


0 h 


h r 


sul 


S s] 


gur 10. 


h 


p r m 


rs r p h 


sur 


1 


0 cr h su 


roug' 


m sur 


s V r 


0 sur 


slop 


Eo 


h 


Hum 


0 










Towards the Integration of Object Recognition 243 




-20 0 20 40 60 OO 

TurnlahlA OrlAntatlfin 



Fig. 9. a a a a s V as 

s a V ca ax s was ac as 

a s a was c ca a v w a a s 

CSC va s ao a ai ca a a c as 

ca vs X a va s a s w 



h gr m s X 11 how v r h mo Ihssu grsor- 

om so h goo h s s mpl s s o surpr s g. Iso qu s o 

r s ou h pr 1 y o ssum g u orm x Hum o 

sy. h xsp hsl o v lopm so us j rgoso 

r V mor vr srposusgh mo 1. 

or X mpl os rhr j rgosor mor g r lly hr r g o s 
h ur g h X g pro ss h w h g r 1 or 

o . L ssum h Or - y r r fl mo Is r v 1 1 or 

h r g o . s jus show h s r fl mo Is r v mp r lly 

y gosrv rgo sy . surhr ssum h h 1- 

lum osos ohr o mguovrhrgosh 

h Hum o s y r o r v rom h o s rv mg 

rgo s s. 




244 Joe Mundy and Tushar Saxena 




Fig. 10. 


c 


a s 






a a 


w 


S V 


va a 


av a 




$ 






s 


S V 


s 


c as 


s w 






av a 


s 


V 


a a a 


a X 


a 




s 










a a 


a a 


s 


s a 


P 


.9 a <7 


0° 




c 


a 


s 


Eq was s 


a 


a 


a a Or 


0° 













h s pro ss s 


1 


y 


h 




X g pro 


ss 


0 




0 


s 


h 


or 0 0 h 


m r 


s k 


ow 




V r 




r 


m 


r 1 


V 0 


h 


r g 0 s. h s 


m 


r or 




0 


1 k 


0 


h 




u 1 


u 1 




or 0 g V 


1 


r 


m 


g 


qu s 0 pro 


ss 


su 


h 


s 


ur 


1 


h r 1 V s 


y 0 


our 


h r 


g 0 


h us 




s 


s 


0 


ry 


X 



or s V 1 0 0 pur ly g om r x g. 



5 Discussion 



w sugg s o 
mo Is o g om 
r og o . h 
orm o 
V r 0 0 

p r m r z 
pr pi ompo 



s h V o 

ry Hum 

X 

c a a a c 
m g 

w h r sp 
s o o j 



r so how o m gh om 
o w h h mp r 1 pp r 
g mo 1 s r ov m gh 

wh r h o j g om ry s 
s ov r s o V ws. h s 
o V wpo X ly h s m 

m g V ws . . h ppro h 



h or 1 
mo Is or 
os r 
p ur h 

V r os 
m r s h 
L M. 



V r 





Towards the Integration of Object Recognition 245 



ur h r s propos h mgs gm o so opolog lly 
ur rgo ou rss rrprs oorh syppr 

o sruur h h g u oso syvr lyovrv wpo 

hsrprs o sky oh o gur o o h o j s r 1 

sruurrhrh h ssu os rv rompr pi ompo s. h 

us o g o sur s ur pi m o g om r ur s. Mo r 

g ors pi ou r s o su -p X 1 ur y. 

h s llus r o s m r ly sugg sh slyoh groo sy 
g om rrprs os. hrlo rm o o h ppro hwllr qu r 

h V lopm o ull o j r og o sys m s o su h gr o . 

w 11 ss ry o show hh oo sy-rv xg 

V 1 o r u s pro u sg h m opr orm 

References 

[1] P.R. Beaudet. Rotational invariant image operators. In International Conference 
on Pattern Recognition, pages 579-583, 1978. 

[2] W. Eric L. Crimson. Object Recognition by Computer: The Role of Geometric 
Constraints. The MIT Press, Cambridge, Massachusetts, London, England, 1990. 

[3] R. M. Haralick and L. C. Shapiro. Computer and Robot Vision, volume 1. 
Addison- Wesley, 1992. 

[4] D. Jacobs. Space efficient 3-d model indexing. In Proceedings of the International 
Conference on Computer Vision and Pattern Recognition, pages 439-444, 1992. 

[5] Joseph L. Mundy and Andrew Zisserman, editors. Geometric Invariance in Com- 
puter Vision. MIT Press, 1992. 

[6] H. Murase and S. Nayar. Learning and recognition of 3d objects from appearance. 
The International Journal of Computer Vision, 14(l):5-24, 1995. 

[7] M. Oren and S. Nayar. Ceneralization of the lambertian model and implications 
for machine vision. The International Journal of Computer Vision, 14(3):227-253, 
1995. 




Recognizing Objects Using Color- Annotated 
Adjacency Graphs 



t u u ha ax na an i ha a tl 

GE - Corporate Research and Development, 

P.O. Box 8, Schenectady, NY, 12301. 

T. Saxena : CMA Consulting Services, Schenectady, NY 12309 



Abstract. We introduce a new algorithm for identifying objects in clut- 
tered images, based on approximate subgraph matching. This algorithm 
is robust under moderate variations in the camera viewpoints. In other 
words, it is expected to recognize an object (whose model is derived 
from a template image) in a search image, even when the cameras of the 
template and search images are substantially different. The algorithm 
represents the objects in the template and search images by weighted 
adjacency graphs. Then the problem of recognizing the template object 
in the search image is reduced to the problem of approximately match- 
ing the template graph as a subgraph of the search image graph. The 
matching procedure is somewhat insensitive to minor graph variations, 
thus leading to a recognition algorithm which is robust with respect to 
camera variations. 



1 Outline 

h p nt pap i a m tho o n in o j t in ima . h t pi al 

ituation i that on ha an ima o th o j t on ht th t mplat ima . 

h ta i to n th o j t in a n w ima ta n om a om what i nt 

vi wpoint po i 1 un i nt li htin . h m tho u i a on ap- 
p oximat att i ut aph mat hin . a t t p th ima i m nt 
into ion o app oximat 1 on tant olo . h om t i al lation hip o 

th m nt olo ion i p nt an att i ut aph in whi h 

ah m nt o pon to a v t x in th aph an p oximat ion a 
join an . ti a annotat with th iz hap an olo o th 

o pon in m nt. in in an o j t in a n w ima th n om own to 

an app oximat aph-mat hin p o 1 m in whi h a mat hi on ht in th n w 

ima o a u aph app oximatin th on o pon in to th on ht o j t. 

h aph mat hin an onl app oximat an o th in xa tn o th 
m ntation p o an th han aptothojt uto han o 
li htin vi wpoint an po i 1 pa tial o In ion. 

h ha n mu h p viou wo in th a a o o nition om olo . n 

impo tant o o wo i on n with what ha n oa 1 all olo 
on tan 27 12 9 11 17 1 10 4 . h on no u h pap i to o niz 

an o j t a on it olo alon . pi all i n pa o hi to am t hniqu 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 246-263, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




Recognizing Objects Using Color-Annotated Adjacency Graphs 247 



o imila app oa h a u to ha a t iz an o j t. h m tho 1 on 

th i t i ution o olo in a u uall va u 1 n ion o an ima . n 
i nt on ition o li htin th hi to am o i n pa ion o u a will 
va . a iou 1 ophi ti at mo 1 hav n p opo o thi han a ilit 
an in om v impl mo 1 u h a impl int n it va ia ilit ( 27 ) to 
afhii olo t an o niation (1 ) an ph i al at mo ph i illumination mo 1 
( 19 ). n all u h pap a not p i all on n with lo atin an 

o j t to o niz in an ima o in n in an o j t that o upi onl a 
mall pa t o an ima . n x ption i 19 in whi h o nition i at th 1 v 1 

o in ivi ual pix 1 . n a ition an om t i al in o mation a out th lativ 

lo ation o i nt olo pa t o th ima i u uall lo t ( o in tan in 

hi to ammin t hniqu ). 

imila in on pt a nt popula app oa h to o nition ha n th 

app a an a la nin m tho o a a an Mu a al o Lin an L (22 

25 26 21 ). hi app oa h li u u a in an i n pa to p nt 

th vi w o an o j t un i nt po . h m tho om in a in 1 

ompl X a th num o o om o po an li htin in a . On 

mo u h m tho a t uit o o nition o an o j t that on titut 
a ompl t ima . a hin o a an i at o j t in a ompl x n i t at 
a a pa at i u . 

n alt nativ lin o atta on o j t o nition ha n to u th om- 

t o th o j t. pi all thi involv t tion an oupin ollow 

om o t o in xin o t mplat mat hin a on om t . mon man 

po i 1 n w it on ( 32 ). h p nt wo to amal amat th 

olo on tan an om t i app oa h to o j t o nition. viou wo 

in amal amatin om t an olo in lu 23 5 24 7 30 6 . a li wo 

o an on an i man ( 15 16 ) la a oun ation o thi app oa h. mon 

thi it wo th app oa h o 5 i lat to ou alin with lo whi h 

a Hip oi al a a o on i t nt olo . imila 1 in ou app oa h ion o an 

ima m nt o tain om m ntation a p nt th i p in ipal 
mom nt tiv 1 t atin th m a Hip 

2 Extracting Object Faces from Images 

h t t p in th al o ithm i th ivi ion o th ima into a (o ion ) 
o app oximat 1 on tant olo .ha xt a tion p o p o in th 
th a i t p whi h will outlin in u qu nt tion . 

2.1 Detecting Approximate Region Boundaries 

i t th oun a i (that i ) h poth iz to n lo th ion a 

t t . a t t p in thi p o th in th ima a t t u in 

a ann - t 1 t to an lin m nt a tt to th ultin 1 . 

t i a ona 1 to a um that th ion oun a i pa th ou h th ult- 

in lin m nt in un ou a umption a a oun a will p o u 




248 



Peter Tu, Tushar Saxena, and Richard Hartley 




Fig. 1. Left : The result of edge deteetion in a template image eontaining a cup. 
Right : The result of adjusting lines fitted to the cup edge segmentation. 



a i ontinuit in th int n it an olo va iation o th n lo 
thu how up u in t tion. ow v t pi all th 

t t in th o m o num ou mall o n lin m nt ( 

t i iffi ult to i nti th xa t om t o th n lo a 

th lin m nt . o imp ov th oun a om t i w u 
to u th p o th lin m nt . om o th h u i ti a 



ion an 
oun a i a 
u 1(1 t)). 
i tl om 
om h u i ti 



— M lin that a n a 1 ollin a an within om p oximit th hoi o 

a h oth . hi i u ul in atin a in 1 whi h ma hav o n 

up into va iou mall ( ut n a 1 pa all 1) m nt u in t tion. 

— at - un tion om pai o lin on o whi h n lo to th in i 

o an awa om th n point o th oth . hi i u ul in atin 

int tion o on o lu in o j t . hi will h Ip in o tainin w 11- 

n a on oth o j t . 

— nt t lin who n -point a lo to a h oth an th lin a at 

o tu an 1 . hi at th on o an o j t whi h ma not hav 

n t t u in m ntation. 



i u 1( i ht) how th ult o appl in th h u i ti on th m ntation 
o th ima in u 1(1 t). n ou xp i n th h u i ti ai i ni anti 
in o tin mo t o th n at oun a m nt . 



2.2 Estimating Initial Uniform Regions: Constrained Triangulation 

in th oun a lin m nt om th p viou t p w now n at an 

initial pa tition o th ima into t ian 1 o uni o m int n it an olo . hi i 
a ompli h a constrained triangulation o th oun a lin . on t ain 
t ian ulation p o u a tot ian 1 whi h join n a t point ( n -point o 

th lin ) ut p t th on t ainin oun a lin . hat i a h oun a 
lin m nt will an o om t ian 1 . 






Recognizing Objects Using Color-Annotated Adjacency Graphs 249 




Fig. 2. Constrained triangulation on the adjusted eup lines. On the left, the 
triangles only; on the right the triangles superimposed on the image. 



in allt ian la o m om th n -point an lin onth oun a i 

o th a a h t ian 1 li ompl t 1 in i a a . Mo ov in th 

t ian 1 ov th whol ima a h a an p nt a union o a 

nit num o th t ian 1 . an xampl th ult o on t ain 

t ian ulation on an ima m ntation in u 2. 



2.3 Extracting Object Faces: Region Merging 

n th n xt t p a ion-m inpo u iu to in m ntall n at 

th vi i 1 o j t a in th ima . ta tin with th t ian ula ion om 
th on t ain t ian ulation n i h o in ion a u iv 1 m i th 

hav at 1 a t on o th ollowin p op ti 

1. Similar color intensities: wo a ja nt ion am i th i n 

tw n th i av a olo int n it v to i 1 than a th hoi . hi 
i a a ona 1 m in p op t in n i h o in a in o j t a at 
an 1 to a h oth an a li 1 to a t ima o i nt int n iti . 

a n m nt o thi m tho on oul m two ion a on a 

i ion o whi h o two h poth (th two ion a pa at ; th two 
ion houl o ma in 1 ion) ip a 1 a onth olo tati ti 

0 th ion . n a ition a lin a o mo ompl x olo a i nt ov a 
a oul mo 1 . h m tho hav n u t in 15 16 ut 

w hav not t i th m t. 

2. Unsupported bridge: wo a ja nt ion am i th p nta o 

ommon tw n th m whi h a unsupported i la than a th h- 

01 . n i ai to uppo t i a p i p nta o it pix 1 

Ion to an 1 t t th t to . M in a on thi p op- 

t will n u th in lu ion o tho oun a m nt whi h w mi in 
om th to lin m nt iv om 1 . hi i mon t at 





250 



Peter Tu, Tushar Saxena, and Richard Hartley 



th at that a num o non- on t ainin lin in th t ian ulation n up 
in oun uppo t . 

t a h m th p op ti ( iz an olo ) o th n w la ion 

a omput om th p op ti o th two ion in m 




Fig. 3. Faces of the cup (left) and urban scene (right) extracted by our algorithm. 



n 1 

tw 

th 

th 

th 



a o iat 
alon 
not p 
a o a 
mall 
m in 



in it ation ontinu until th olo int n iti o ah pai o 

ion a uffi i ntl i nt an mo t o th ommon 

m nt ation. n ou a umption a out 
illumination it i a ona 1 to a um that 
ima o th a o o j t pi tu in 
u 3 whi h how th a xt a t u in 



h m 
h o in 

n th m hav uppo t om th 
natu o th o j t an th 
ultin ion a li 1 to 
ima . an illu t ation 
ou al o ithm. 

h ult o th m ntation an m in 
olo ( ) valu . pi all th 

ion oun a i . h a mov 

nt m anin ul a in th ima 

oun a . imila 1 an i ual v 
ion 1 that a out 30 qua pix 1 
in li ht va iation o olo hav p 



al o ithm i a to ion with 

main mall na ow ion 1 in 

om on i ation in th o 
ut a au olo t an it ion 

mall ion a mov . hu 
om tim main at th ion 
V nt th m om in m 



with a ja nt la 



ion . 



3 Deriving Graph Representations of Objects 

On all th o j t a in th ima hav n n at th a p nt 

a a aph. o aptu th lativ pla m nt o th o j t in th ima 





Recognizing Objects Using Color-Annotated Adjacency Graphs 251 

an th topolo o th n an a ja n aph o th a in th n i 

on t u t . 

a hv t xinth aph p nt a ion an i annotat withth hap 

po ition an olo att i ut o th ion. hap i p nt th mom nt 
mat ix o th ion om whi h on ma iv th a a o th ion alon 
with th o i ntation an atio o th p in ipal ax o th ion. n t th 

ion i in p nt a an Hip . hi hap p ntation i o on 

an xt m 1 ou h p ntation o th hap o th ion. ow v it i al o 
quit o ivin o va iation o hap alon th oun a i o v n a tain 
o am ntation o th ion. in mat hin will not on impl on 
th a i o a ion-to- ion mat h ut ath on mat hin o ion lu t 
thi 1 V 1 o hap p ntation ha p ov n to a quat . Mo p i hap 

timat hav n on i how v . h i u mu t i tat th 

o a u a an p ata ilit o th m ntation p o how v . h 

olo o th ion i p nt an olo v to . 0th p ntation 

a o ou po i 1 an hav n t i oth autho (15 16 ). 

au o th po i ilit o ion in a m nt o ion in im- 
p op 1 m it tu n out to inapp op iat to u in th aph to 

p nt ph i all a ja nt ion . h a ja n aph n at u h a 

ul i too n itiv to mino va iation in th ima m ntation. n t a th 
hoi wa ma o joinin a h v t x to th v ti p ntin th iV lo t 

ion in th m nt ima . valu o N wa ho n. hu a h v t x 
in th aph ha n i h o . 

4 The Three-Tier Matching Method 

h u tion o th ima to an att i ut aph p nt a i ni ant im- 

pli ation. h aph o pon in to a t pi al ompli at ima (th a h 
ima ) ma ontain up to 500 o o v ti wh a th aph o pon in 

to an o j t to oun (th t mplat ) ma ontain 50 v ti o o. hu a 
ompl t on -on-on ompa i on ma a i out in quit a ho t tim . 

h a h i a i out in th pha a ollow 

1. Local comparison. on -to-on ompa i on o ah pai o v ti i 

a i out. a h pai o v ti on om th template graph an on om 

th search graph i a i n a o a on imila it o hap iz an 

olo within ath li al oun . 

2. Neighborhood comparison. h lo al n i h o hoo on i tin o a v - 

t X an it n i h o in th t mplat aph i ompa with a lo al n i h- 

o hoo in th ah aph. o iain toahuhniho hoo 

pai in a on ompati ilit an th in ivi ual v t x-pai o 

3. Global matching. ompl t aph-mat hin al o ithm i a i out 

in whi h p omi in mat h i nti in th ta -2 mat hin a pi 

to th to i nti a pa tial (o optimall a ompl t ) aph mat h. 

a h o th t p will i in mo tail in lat tion . h 

i a hin thi multi- ta mat hin app oa h i to avoi ulin out po i 1 




252 



Peter Tu, Tushar Saxena, and Richard Hartley 



mat h at an a 1 ta ma in th mat hin p o o u t to i n in 

th m ntation an vi wpoint. hi app oa h i motivat om th o in 
m tho u in t nni mat h in whi h a th -ti o in t m i u 
am t mat h. t a h ta li ht a vanta a ampli . pla who 

win 55% o point will win 62% o am 2% o t an 95.7% o mat h . 

hu th tt pla will (almo t) alwa win pit t mpo a t a . n 

th am wa th th -ti aph mat hin m tho p ovi a o u t wa o 
onv in to th o t mat h pit lo al flu tuation o ion-to- ion 

o in . 



4.1 Local Matching 

n lo al mat hin in ivi ual v t x pai a valuat . a h pai i a i n 
a o a on hap an olo . all that a h ion i i aliz a an 

Hip . hap a ompa on th a i o th i iz an nt i it . p to a 

a to o 2 i n in iz i allow without i ni ant p nalt . hi allow 

o i nt al in th two ima within a ona 1 oun . 

au o i nt li htin on it ion olo ma i tw n two ima 

h mo t i ni ant han in olo how v i u to a i htn i n . 
o allow o thi olo a no maliz o in ompa . h olo o a 

ion i p nt a v to an v to that i a on tant multipl 

a h 1 to p nt th am olo . 

h o t o a lo al mat h tw n two v ti i not Qocal- 



4.2 Neighborhood Matching 

a h V t X (h all o ) in th aph ha i ht n i h o p ntin th 

i ht lo t ion . n ompa in th lo al n i h o hoo o on o v t x 
Vo with th lo al n i h o hoo o a pot ntial mat h Vq an att mpt i ma to 
pai th n i h o no o uq with tho o Uq. n thi mat hin th o o th 

n i h o V ti mu t p v . hu 1 t ui, U 2 , ■ ■ ■ , Un th n i h o o 

on o V t X iv n in li an ula o a oun t h o an 1 t Vj^, ... ,v^ 
th n i h o o a pot ntial mat h o imila 1 o . On u t 

A" o th in i an S' o th in i an a on -to-on 

mappin a S ^ S o that th mat hin Vi v , v li o . h 

total o t o a n i h o hoo mat h i qual to 

^nbhd ^oC'jQQg^j(vo, Uq) ’^cr(i)) 

i S 

wh '(Ui i a w i ht tw n 0 an 1 that p n on th atio o i tan 

tw n th o V ti an th n i h o Ui an . o ah pai o o 
V ti uo,Ug th n i h o hoo mat hin that maximiz thi o t un tion i 
p il an fh i ntl oun nami p o ammin . 




Recognizing Objects Using Color-Annotated Adjacency Graphs 253 



4.3 Graph Matching 



n p viou tion th t mplat ima an th ah ima w u to 

a aph an an i at mat h tw n v ti in th two ima w oun . 

h oal o thi tion i to n at a mutuall on i t nt t o v t x mat h 
tw n th t mplat an th ah ima . n a o iation aph G 2 p o- 
vi a onv ni nt am wo o thi p o . n on i in th a o iation 

aph it i impo tant not to on u it with th ion a ja n aph that 

ha n on i o a . n th a o iation aph v ti p nt pai o 

ion on om a h ima . uhavtx p ntah poth iz mat h- 

in o a ion om th t mplat ima with a ion om th ah ima . 

i ht in th a o iation aph p nt ompati iliti tw n th 

ion mat hin not th two v ti onn t th 
hu a V t X in th a o iation aph i iv n a ou 1 in x an not 

Vij m anin that it p nt a mat h tw n ion Ri in th t mplat ima 

an ion Rj in th ah ima . hi mat h ma not Ri — Rj . 

an xampl i ji — j 2 th n i not ompati 1 with Vij ^ . hi i an v t x 
Vij p nt a mat h Ri — Rj^ an Vij^ p nt th mat h Ri — Rj^ an it 
i impo i 1 that ion Ri houl mat h oth Rj^ an R-^ . hu v ti 

an Vij 2 a in ompati 1 an th i no joinin th two v ti in th 

a o iation aph. h a oth a in whi h mat h a in ompati 1 . o 

in tan on i a v t x p ntin a mat \i Ri — Rj an a v t x Vki 
p ntin a mat h — R^. ion Ri an Rk a lo to th in th 
t mplat ima wh a i? ■ an R, a a apa t in th ah ima th n th 

mat h Ri — Rj an Rk — Ri a in ompati 1 an oth i no joinin 

th V ti Vki an Vij. Mat h ma al o in ompati 1 on th oun o 

o i ntation o olo . 

o mall th a o iation aph G -V, E— i ompo o a t o v ti 

Vana towiht E — V — V. ahvtxw p ntapoil 

mat h tw n a t mplat ion an a a h ion. th a t mplat 

ion an M ah ion th n V woul hav NM v ti ( u 4). n 
o to u th ompl xit o th p o 1 m th aph G i p un o that 
onl th top 5 a i nm nt o a h t mplat ion a in in in V. h 

no a la 1 Vij whi h i int p t at th jth po i 1 a i nm nt o th 

ith t mplat ion. la no o a h t mplat ion i in t into th 

aph. h la no vm p nt th po i ilit o th LL a i nm nt o 

th ith t mplat ion that i no mat hin ion xi t in th oth ima . 

an e (vij,Vki) xi t th n th a i nm nt tw n no Vij an 

Vki a on i ompati 1 . h w i ht o th a iv om th 

ompati ilit mat ix C whi hi n a 



C(ij)(^ki) 



'0 i j Oo I 0 

0 i i k an j — I 

< 0 to 1 i (i, j) {k, 1) 

0 to 1 i Vij an vki a ompati 1 

—N i V ti Vij an Vki a not ompati 1 




254 Peter Tu, Tushar Saxena, and Richard Hartley 

h i th num o t mplat ion . h valu o p nt 

th o iv n to th in ivi ual a i nm nt n no Vij. u aph 

o G p nt a olution to th mat hin p o 1 m. h hoi o w i ht A" 
o an in ompati 1 mat h i to i iminat a ain t in ompati 1 mat h an 
ma tain that a to with maximum w i ht p nt a liqu o 

ompati 1 mat h . 



Template Reference Association 

Objects Objects Graph 




Fig. 4. The template and search images are reduced to a set of regions. Each 
possible pair of assignments are assigned to a node in the association graph. 
Edges in the graph conneet compatible assignments. 



h m tho o t minin ompati ilit an a i nin ompati ilit o 
C'(ij)(fcz) o ompati 1 mat h i a ollow . on i a an i at ion pai 

Ri — Ry h lo al n i h o hoo o ion Ri ha n mat h with n i h- 

o hoo Rj u in th n i h o hoo mat hin ta . n oin thi a to 

n i h o o th ion Ri hav n mat h with th n i h o o th ion 
Ry hi mat hin ma on i a a o pon no v al ion 

(au tothniho o Ri) with an qual num o ion in th oth 

aph. om th o pon n a p oj tiv t an o mation i omput that 

map th nt oi o Ri to th nt oi o R^ whil at th am tim anal a 
po i 1 mappin th n i h o in ion o Ri to th i pai n i h o o Rj. 

hu th n i h o hoo o pon n i mo 1 a lo 1 a po i 1 a 

p oj tiv t an o mation o th ima . L t H th p oj tiv t an o mation 
o omput . 

ow 1 t — Ri anoth an i at ion mat h. o how w 11 thi i 
ompati 1 with th mat h Ri — R^ ih p oj tiv t an o mation H i appli 
to th ion to howw \\H{Rk) o pon with/f^. am a u o thi 
o pon n th V to om RjRi i ompa with th v to RjH{Rk). 

hi i illu t at in u 5. ompati ilit o i a i n a on th 
an 1 an 1 n th i n tw n th two v to . h two a i nm nt a 



m in ompati 1 i th an 1 tw n th two v to 
o th i 1 n th atio x 2. 



45 



X 





Recognizing Objects Using Color-Annotated Adjacency Graphs 255 



olo ompati ilit o i al o n . h o pon n o a o 

V t X an it n i h o with th mat h on u ation in th oth ima an 

u to n an affin t an o mation o olo pa om th on ima to 

th oth . n afhn olo t an o mation i a uita 1 mo 1 o olo va ia ilit 

un i nt li htin on ition ( 1 )■ h afhn t an o mation n o 
on mat h no pai in to t min wh th anoth mat h no pai 
i ompati 1 . 

h nal ompati ilit o i omput a 

C{ij)(kl) C’nbhd('bi) - C’nbhd(^.0 - nl ompati ilit o - 

1 n th atio ompati ilit o — olo ompati ilit o 




Fig. 5. Compatibility of two matches is determined by applying the transfor- 
mation H defined by the neighbors of the first pair [Ri,R-) to the region Rk 
belonging to the second pair. The positions of HRk and R^ relative to Rj are 
compared. 



4.4 Solution Criteria 

h ou h t an o m o mat h It in app oa h a um t hat a lo al t an - 

o mation n a lativ 1 mall t o pa am t an u to map 

th t mplat ion onto th ah ion . h la t t o no in V 

whi hi on i t nt with a pa ti ula t an o mation woul th n on titut a 

nal olution. ow v ju t an two no a on i t nt with a pa ti u- 
la t an o mation o not n a il impl that th two no a on i t nt 
with a h oth . o in tan in th a o iation aph o 4 a mat h (c, 4) i 

ompati 1 with (6,1) an (6,1) i ompati 1 with (c, 3). ow v (c, 3) i not 

ompati 1 with (c, 4) in c an not imultan ou 1 mat h with oth 3 

an 4. 

popula aphi al app oa h whi h an ta a vanta o om o th in- 

o mation ontain in th t u tu i a no lu t in t hniqu wh a 




256 Peter Tu, Tushar Saxena, and Richard Hartley 

impl pth t a h i u to t min th la t onn t u aph o 

G. onn t aph i on in whi h a path o xi t tw n v pai 

o no in th aph. hi olution p nt a tain amount o on i t n . 

ow V a o th tat m nt that no a i on i t nt with no b an no 
b i on i t nt with no c o not n a il impl that no a i on i t nt 

with no c. hi 1 a to th on In ion that in o to ta nil a vanta 

0 th mutual on t aint m in th a o iation aph th nal olution 

houl p nt a liqu on G. 

u t R — V i a liqu on G i Vij,Vki — R impli that {vij,Vki) — E. 
h a h o a maximum liqu i nown to an NP ompl t p o 1 m 

14 . V n a t p unin th omputational o t a o iat with xhau tiv 

t hniqu u h a 1 woul p ohi itiv . t ha n po t 3 that 

t minin a maximum liqu i analo ou to n in th lo al maximum o 

a ina qua ati un tion. utho u h a 20 2 hav ta n a vanta o 

thi i a u in taxation an n u al n two m tho to app oximat th 

lo al maximum o a qua ati un tion wh thi maximum o pon to 

th la t liqu in th a o iation aph. Ithou h th la t liqu whi h 

1 a on th in o mation ontain inE nu ahihlvlo mutual on- 

i t n th nuan o th ompati ilit m a u in C a lot.no to 

ta a vanta o th ontinuou natu o th t n th a qua ati 

o mula i p i wh th lo al maximum o pon to th liqu that 

ha th maximum um o int nal t n th . n app oa h a on ol 

an an a jan a ual a i nm nt al o ithm ( ) i u to timat th 

optimal olution. h i an it ativ optimization al o ithm whi h t at 

th p o 1 m a a ontinuou p o ut onv to a i t olution. v n 
thou h th olution mi ht n at a on a lo al maximum thi olution 
will ua ant to a maximal liqu . maximal liqu i on that i not 

a p op u t o an oth liqu . 

4.5 Binary Quadratic Formulation 

ina olution olumn v to m i n u h that i mij 1 th n Vij i pa t 

o th nal olution an i mij 0 th n Vij i x lu om th nal olution. 

th la no ViQ i pa t o th nal olution th n th t mplat ion i ha 

no a i nm nt. h olumn an ow o pon in to th la no in th 

ompati ilit mat ix a 11 with 0 nt i . om a aph th o point o vi w 

th la no a onn t to all oth no with z o w i ht. 

h ina qua ati o mula F(m) i n a 

F{m) m Cm (1) 

wh C i th ompati ilit mat ix n in tion 4.3. no to n u 
that a h t mplat ion an mapp to at mo t on ah ion th nal 
olution i on t ain u h that 

6 

mij 1 o all i . 
i=i 



( 2 ) 




Recognizing Objects Using Color-Annotated Adjacency Graphs 257 



olution o pon in to a lo al maximum o F(m) p nt a to 
a i nm nt with th la t amount o mutual ompati ilit . n maximum o 
F(m) ( lo al o lo al) p nt a maximal liqu on G. o how thi on i a 
pa ti ula olution m wh th xi t u h that rriij 1 an ruki 1 

ut that th no Vij an Vki a in ompati 1 a i nm nt . 1 a 1 thi i th 

onl on ition n a o th olution m not to quali a a liqu . on 
olution m i int o u th am a rn x pt that fhij 0 an yrqo 1 whi h 
m an that ion i ha no a i nm nt. in th nition o C ( tion 

4.3) an quation 1 an 2 it an hown that th i n tw n F(m) 
an F(m) i 

N 6 

F(m) - F(m) 0 - 2 ^ - 2{N - {N - 1)) 2 (3) 

q—1 r=l 

h o F(m) o not p nt a maximum whi h m an that onl a liqu 
on G an n at a maximum on F(m). h n xt t p i to n a olution 
whi h i a maximum o F(m). 



4.6 Approximating the Clique with the Largest Degree of Mutual 
Compatibility 



p viou 1 tat th a h o lo al maximum o 0-1 qua ati quation 

i nown to an NP ompl t p o 1 m o that an app oximat olution to 

th optimum valu o F(m) will hav to timat . h i a u iv 

outin u to olv a n al a i nm nt p o 1 m un th on t aint that 

a i nm nt mu t on to on . n ina qua ati o t un tion an 

u to iv th optimization p o . h n n atin th ompati ilit 

mat ix two no Vij an Vki a on i in ompati 1 i th map t mplat 
ion i an k to th am a h ion. n lu ion o Vij an Vki in th nal 

olution woul ont a i t th tat m nt that a nal olution i ua ant to 

a maximal liqu . hi m an that th po tion o th that p v nt a 

man to on on ition om o u in n not impl m nt 

nitiall m i t at a a ontinuou v to . v al on t aint a pla 

on th optimization p o 

-jj m,ij - 0 (4) 



6 

1- 

i=i 

u in a h it ation t th up at ul o th i a ollow 



( 5 ) 



/3 

e 



Srriij (t) 



E' 



6 J 



iF(t) 



rriij {t 1) 



( 6 ) 




258 



Peter Tu, Tushar Saxena, and Richard Hartley 



wh /3 i a po itiv num an 

^F(t) 

J '' ' p—1 q—1 

h up at quation 6 n u that on ition 4 an 5 a maintain . nitiall 
/3 i t to a low vain o that multipl olution an o xi t. h valu o f3 
i a uall in a . an n om quation 6 a /? om la th 

valu o m a o to i t valu o 0 o 1. 

i u 6 how an xampl o th optimization p o . qu n o nap- 

hot aphi all i pla th volution o th olution v to oat mplat 

ima o 15 ion . t th t initial it ation th LL a i nm nt a 
avo au o th in on i t n i tw n ival olution . tw n tim 1 

an tim 3 a ominant olution in to m . h olution i n u in 

tim 4 an tim 5. t tim 6 th al o ithm ha onv to a nal olution an 
tim 7 th o Si i nt hav ta n on ina valu . 

5 Results 

h al o ithm wa t i on v al t o olo ima . h t xampl wa 

a omput manual hown in u 7. h manual wa a il oun in i nt 
ima o a lutt ta 1 -top v n wh n th manual wa pa tiall o lu 

ot that a on manual hown in th ima i not oun in it i a tuall 

a i nt olo thou h thi i not o viou om th - al ima hown in 
th pap . 0th xampl a hown in u 10 an 9. 

6 Conclusion 

h amal amation o ion m ntation al o ithm with mo n olo on- 

tan m tho iv th po i ilit o imp ov o j t o nit ion in olo an 

multi- p t al ima . h a option o an in xa t aph-mat hin app oa h 

ma o nition in p n nt o mo at li htin an vi w-point han 

h aph mat hin app oa h wa a 1 to n at olution with on i t n 
at multipl 1 v 1 . h ion a ja n aph w a 1 to hi hli ht ima to 

t mplat o pon n with t on lo al uppo t. in i tin that th nal 

olution mu t p nt a liqu on th a o iation aph lo al on i t n 

wa a hi V . Ithou h th maximum liqu p o 1 m i ompl t it wa 

mon t at that t on maximal liqu an n at u in a va iation o 
th a ual a i nm nt al o ithm. 




Recognizing Objects Using Color-Annotated Adjacency Graphs 259 



Evolution of Decision Vector 



0 


— 


- 


0 





_ 


■ 


1 






1 


— 


— 


■ 


2 


— 


— 


2 


_ 


_ 


_■ 


3 






3 







_■ 


4 






4 






■ 


5 






5 




__ 


■ 


6 






8 




_ 


_ ■ 


7 






7 


- 


— 


■ 


8 


— 


_ 


8 


_ 


_ 


■ 


9 






9 


_ 





■ 


10 






10 








■ 


11 






11 







_ ■ 


12 






12 


_ 


_ 


■ 


13 






13 







■ 


14 






14 




-- 


-■ 




lime 0 








time 1 




0 




■ 


0 






■ 


1 




■ 


1 






■ 


2 


— 


_ ■ 


2 




■ 


_■ 


3 




■ 


3 




_ 


■ 


4 


■ 


■ 


4 




■ 




5 




■ 


5 






■ 


G 


■ 




G 




■ 





7 


■ 


- 


7 


■ 




_ 


8 


■ 


_ — 


8 


■ 







9 


■ 


— 


9 


■ 







10 


■ 




10 




■ 




11 


_ 


■ 


11 






i 


12 


_ _ 


■ 


12 




_ 


■ 


13 




■ 


13 






■ 


14 


— 


-- 


14 




— 


■ _ 




time 4 








time 5 





0 

1 


- - 


■ 

■ 


0 

1 


- 


■ 

■ 








2 




_■ 


3 






3 




■ 


4 


— 


■ 


4 




■ 


5 




■ 


5 




■ 


6 


— 


■ 


G 


■ 


■ 


7 


- 


■ 


7 


■ 


- 


8 


- — 


_■ 


8 


■ 




9 


- 


■ 


9 


■ 


- 


10 


- - 


■ 


10 


■ 


■ 


11 


— 


■ 


11 





■ 


12 


_ _ 


■ 


12 


_ _ 


■ 


13 




■ 


13 




■ 


14 


-- 


_ ■ 


14 


__ 




lime 2 






time 3 




0 




■ 


0 




■ 


1 





_■ 


1 




■ 


2 


■ 


_ _ 


2 


_■ 




3 




_■ 


3 




■ 


4 


■_ 




4 






5 




i 


5 


■ 


■ 

1 


6 

7 


■ _ 

■_ _ 


— 


6 

7 


■ 

■ 


— 


8 


■ 




8 


■ 




9 


■ 




9 


■ 




10 






10 


■_ 




11 




■ 


11 




■ 


12 




_■ 


12 




■ 


13 




1 


13 




■ 


14 





■_ 


14 


— 


1 




time 6 






time 7 





Fig. 6. Illustration of the GAA optimization process. The coefficients for the 
solution vector m are shown at various points in time. Each row represents 
the coefficients corresponding to a particular template region. The last column 
at each time represent the coefficients for the NULL assignments. Initially the 
coefficients take on continuous values between 0 and 1. By the end of the process 
only binary values exist. 



261 



Peter Tu, Tushar Saxena, and Richard Hartley 







Image ROI 



Fig. 7. The computer manual used as a template 



Fig. 8. Two examples of recognition. On the left the search image, and on the 
right the outlines of the regions matched against the template. 






Recognizing Objects Using Color-Annotated Adjacency Graphs 261 




Fig. 9. Recognition of cup image. On the left is the template, in the center the 
seareh image and on the right the identified regions of the located cup. Note that 
the cup in the search image is seen from a different angle from the template 
image. The letters REG are visible in the template, but only RE is visible in the 
search image. 




Fig. 10. Recognizing a building. On the left the template, and on the right the 
seareh image showing the recognized building. 







262 



Peter Tu, Tushar Saxena, and Richard Hartley 



References 

[1] Ambler A.P., Barrow H.G., Brown C.M., Burstall R.M., Popplestone R.J., ‘A 
versatile computer-controlled assembly system’, IJCAI, pages 298-307, (1973). 

[2] Ballard D.H., Brown C.M., ’Computer Vision’, Prentice-Hall, Englewood Cliffs, 
NJ, (1982). 

[3] Batahan F., Junger M., Reinelt C., ‘Experiments in Quadratic 0-1 programming’. 
Mathematical Programming, vol. 44, pages 127-137, (1989). 

[4] David H. Brainard and Brian A. Wandell, ‘Analysis of the retinex theory of color 
vision’. Journal of the Optical Society of America, Vol. 3, No 10, pages 1651 - 
1661, (1986). 

[5] J. Brian Burns and Stanley J. Rosenschein, ‘Recognition via Blob Representation 
and Relational Voting’, Proc 27th Asilomar Conference on Signals, Systems and 
Computer, pages 101 - 105, (1993). 

[6] Marie-Pierre Dubuisson and Anil K. Jain, ‘Fusing Color and Edge Information 
for Object Matching’, Proceedings, lCIP-94, pages 471 - 476, (1994). 

[7] Francois Ennesser and Gerard Medioni, ‘Finding Waldo, or Focus of Attention 
using Local Color Information’, IEEE Transactions on PAMl, Vol 17, 8, pages 
805-809, (1993). 

[8] Faugeras O., ‘Three-dimensional computer vision’, MIT Press, (1993). 

[9] G. D. Finlayson, B. V. Funt and K. Barnard, ‘Color Constancy under Varying 
Illumination’, Proceedings of 5th International Conference on Computer Vision, 
ICCV-95, pages 720 - 725, (1995). 

[10] David Forsyth, ‘A Novel Approach to Colour Constancy’, Proceedings of 2nd 
International Conference on Computer Vision, ICCV-88, pages 9 - 18, (1988). 

[11] Brian V. Funt and Graham D. Finlayson, ‘Color Constant Color Indexing’, IEEE 
Transactions on PAMI, Vol. 17, No. 5, pages 522-529, (May 1995). 

[12] Graham D. Finlayson, Mark S. Drew and Brian V. Funt, ‘Color constancy: gen- 
eralized diagonal transforms suffice’, Journal of the Optical Society of America, 
Vol. 11, No 11, pages 3011-3019, (1994). 

[13] Gold S. and Rangarjan A., ’A gradual assignment algorithm for graph matching’, 
IEEE Transactions on PAMI, Vol. 18 No 4, (April 1996), pages 377 - 387. 

[14] Gibson A., ‘Algorithmic graph theory’, Cambridge University Press, Cambridge 
(MA), USA, (1985) 

[15] Allen R. Hanson and Edward M. Riseman, ’Segmentation of Natural Scenes’, in 
Computer Vision Systems, (edited A. Hanson and E. Riseman), Academic Press, 
(1978), pages 129 - 164. 

[16] Allen R. Hanson and Edward M. Riseman, ’VISIONS : A computer system for 
interpreting scenes’, in Computer Vision Systems, (edited A. Hanson and E. Rise- 
man), Academic Press, (1978), pages 303 - 334. 

[17] Glenn Healey and David Slater, ‘Global color constancy : recognition of objects 
by use of illumination-invariant properties of color distributions’. Journal of the 
Optical Society of America, Vol. 11, No 11, pages 3003 - 3010, (1994). 

[18] Glenn Healey and David Slater, ’Computing Illumination- Invariant Descriptors of 
Spatially Filtered Color Image Regions’, IEEE Transactions on Image Processing, 
Vol. 6 No 7, (July 1997), pages 1002 - 1013. 

[19] Glenn Healey and David Slater, ‘Exploiting an Atmospheric Model for Automated 
Invariant Material Identification in Hyperspectral Imagery’, Preprint report : to 
appear (Darpa lU Workshop, Monterey, (1998) ?). 




Recognizing Objects Using Color-Annotated Adjacency Graphs 263 



[20] Lin, F. ‘A parallel computation network for the maximum clique problem’, Pro- 
ceeding 1993 international symposium on circuits and systems, pages 2549-52, vol. 
4, IEEE, (May 1993). 

[21] Stephen Lin and Sang Wook Lee, ‘Using Chromaticity Distributions and 
Eigenspace Analysis for Pose, Illumination and Specularity Invariant Recognition 
of 3D objects’. Proceedings Computer Vision and Pattern Recognition, CVPR-97, 
pages 426 - 431, (1997). 

[22] Hiroshi Murase and Shree K. Nayar, ‘Visual Learning and Recognition of 3-D 
Objects from Appearance’, International Journal of Computer Vision, 14, pages 
5-24, (1995) 

[23] Adnan A. Y. Mustafa, Linda G. Shapiro and Mark A. Canter, ‘3D Object Recog- 
nition from Color Intensity Images’, Proc. ICPR’96, pages 627 - 631, (1996). 

[24] Kenji Nagao, ‘Recognizing 3D Objects Using Photogrametric Invariant’, Proceed- 
ings of 5th International Conference on Computer Vision, lCCV-95, pages 480 - 
487, (1995). 

[25] Shree K. Nayar, Sameer A. Nene and Hiroshi Murase, ‘Real-Time 100 Object 
Recognition System’, Proc. 1996 IEEE Conference on Robotics and Automation, 
Minneapolis, pages 2321 - 2325, (April 1996). 

[26] Sameer A. Nene and Shree K. Nayar, ‘A Simple Algorithm for Nearest Neighbor 
Search in High Dimensions’, IEEE Transactions on PAMI, Vol. 19 No 9, pages 
989-1003, (Sept, 1997). 

[27] Michael J. Swain and Dana H. Ballard, ‘Color Indexing’, International Journal of 
Computer Vision, 7:1 pages 11-32, (1991). 

[28] Pelillo M., ‘Relaxation labeling Networks that solve the maximum clique prob- 
lem’, Fourth international conference on artificial neural networks, pages 166-70, 
published by lEE (June 1995). 

[29] Tushar Saxena, Peter Tu and Richard Hartley, ’Recognizing objects in cluttered 
images using subgraph isomorphism’, to appear in Proceedings of the lU Work- 
shop, Monterey, (1998). 

[30] David Slater and Glenn Healey, ‘Combining Color and Geometric Information 
for the Illumination Invariant Recognition of 3D Objects’, Proceedings of 5th 
International Gonference on Computer Vision, ICCV-95, pages 563 - 568, (1995). 

[31] David Slater and Glenn Healey, ‘Exploiting an Atmospheric Model for Automated 
Invariant Material Identification in Hyperspectral Imagery’, Preprint report : to 
appear (Darpa lU Workshop, Monterey, (1998) ?). 

[32] A. Zisserman, D. Forsyth, J. Mundy, G. Rothwell, J. Liu, N. Pillow, ‘3D Object 
Recognition Using Invariance’, Artificial Intelligence Journal, 78, pages 239-288, 
(1995). 




A Cooperating Strategy for Objects Recognition 



ntonio 


h 11 1 ito 


i ul 


gn zio n ntino^ ni 1 nt 


V i ^ n 








1 nti^ 




ent 0 


nte ip t. i 


Te nolo ie 


ell ono enz nive ity o 


le mo t ly 


2 


ip timento 


i n e ne i 


lett i nive ity o le mo 


t ly 


^ ip 


timento i 


tern ti e 


ppli zioni nive ity o le 


mo t ly 


Abstract. The p 


pe e i e 


n 0 je t e o nition y tern 


e on 


the 


o-ope tion o 


eve 1 vi u 


1 mo ule ( early vision, object detector. 



and object recognizer) . The y tern i active e u e the eh vio o e h 
mo ule i tune on the e ult iven y othe mo ule n y the inte n 1 
mo el . Thi olution Ilow to ete t in on i ten ie n to ene te 
ee k p o e . The p opo e t te y h hown oo pe o m n e 
e pe i lly in e o omplex ene n ly i n it h een in In e in 

the vi u 1 y tern o the S o oti y tern, xpe iment 1 e ult 
on e 1 t e 1 o epo te . 

1 Introduction 

Th p o m n o p ptu 1 y t m li on th ility to o u on 

o int t y m imizing giv not/ n t utility it ion 2 . Th ility 

to It li nt tu i on o th i qu tion o int llig n oth in 

ti i 1 n n tu 1 y t m . Mo ov vi u 1 p ptu 1 y t m houl 

1 to pt th i h vio p n ing on th u nt go 1 n th n tu o 

th input t . u h h vio n o t in y y t m 1 to yn mi lly 

int t with th nvi onm nt. n o m tion- u ion t t gi 4 n mpl 
o uit 1 go 1 -o i nt pp o h 9 . Th omput tion n iv n y 
ompl m nt y in o m tion ou n it m y volv on th i o ptiv 
int n 1 mo 1 n nvi onm nt t n o m tion 

n o j t ognition y t m on th o-op tion o v 1 

vi u 1 mo ul i i ( ig-1). Th m in mo ul o th y t m 

- Early vision, (DST); 

j t t tion (OBD), realized by two co-operating agents (segmentation 
(Snake) and feature extraction (OST)) 



Object recognition 
to {SD) g nt . 


liz 


y th 1 i {CL) n th t u 


tu 1 


ip- 


Th y t m i 1 o 


active 


th h vio 0 h mo ul i 


tun 


on th 



ult o t in y oth mo ul n y th int n 1 mo 1 . Th consistency 
control mo ul How to g n t k p o wh n v in on i t n i 

t t . 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 264-274, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




oope tin St te y o 



je t e o nition 



265 




Fig. 1. Th h m o th o j t ognition y t m. 



Th p opo ttgyh nhon u thp omn o n 
ti i 1 vi u 1 y t m i t ongly influ n y th in o m tion p o ing th t 
i on u ing th ly vi ion ph to reduce th hug mount o in o m - 
tion oil t y th vi u 1 y t m ( oth n tu 1 n ti i 1) n to voi th 
oil p t high omput tion 1 v 1 ( T ot o 14 ). 

T i m n 1 ugg t th t p - tt ntiv p o ing How to o u pi to i 1 

omput tion on gion o int t. n t meaningful im g n h p tu 

(.g. g mil ink nttu) It u ing thi ph . Thi 

p ility i un m nt 1 p t o n tiv vi ion y t m ( loimono 1 ). 

tt ntion i hi v t i nt 1 v 1 o t tion t ting om ly vi ion. 

t h 1 v 1 i nt p igm o omput tion n on i ( om lo 1 
to ym oli ) in lu ing t t mo ling ( own 2 ). 

t ting ou pp o h on ompl n n ly i y in lu ing it in 

th vi u 1 y t m o th o ot y t m 3 . ig.2 how typi 1 1 

wo 1 n t t y th vi u 1 y t m. t p nt k with 








266 



ntonio hell et 1. 



V 1 typi 1 o j t . Th n h ow o o j t n th on ition o 

illumin tion i n tu 1. 




Fig. 2. n mpl o ompl n with o j t on k 

Th p p i o g niz ollow . tion 2 i th ly vi ion t k. 

Th o j t t tion mo ul i i in tion 3. tion 4 i it to 
th 1 i ign. p im nt 1 ult n i u ion giv n in tion . 

2 The Early Vision Phase 



n th ollowing D = ||5i,j|| not th input im g - mo iz N — N n 
0 ~ 9i,j — G — 1. 



n 0 j t i 


i to hi 


it ymm t 


y i 


th 


ppli tion 0 


t in i om t i 


11 ymm t y op to 


1 V it un h 


ng 


whil p t 


p mut 


. 0 


in t n th 1 tt 


m in un h ng 


un 


11 tion th 


1 tt i 


inv i- 


nt un oth 

it nt 


11 tion n 


h 1 -tu n 


th 


i 


1 h nnul 


ymm t y 


oun 


ymm t y pi 


y m 


k 1 ol 


in 


P 


ption p 0 1 


m . 0 


mpl 


p k 0 in 


tivity 


m u 


in 


0 


pon n with vi u 1 p 


tt n 



howing symmetries. 1 v n o ymm t y in vi ion w 1 y not y P y- 
hologi t hoi n 11 h in 11 . 

ymm t yop to h v n in lu invi ion y t m top o m i nt 
vi u 1 t k . o mpl th y h V n ppli to p nt n i 

o j t-p t ( lly 10 ) n to p o m im g gm nt tion in u h 7 . n 

i 1 13 m u o ymm t y i int o u to t t point o int t 

in n . 

Th i t ymm t y T n o m (DST) o D h i in 

y th p o u t o two lo 1 op to 

DSTij = Fij — Eij 



oope tin St te y o 



je t e o nition 



26 



Th t op to i un tion o th i 1 mom nt omput in win ow Ck 
o lin iz 2 /c + 1 n nt in (i,j) 



rph _ Y^r=+k 
-^i,j Z-^r= — k 



E s—-\-k 
s= — k 



r - sin{^) - s- cos(^)| - g, 



with h = 0, 1, 2, n — 1 wh n i th num o ymm t y u . Th 
un tion F p n on th kin o ymm t y to t t . o mpl in 

o nnul ymm t y 

n In 

Th on op to w igh F o ing to th lo 1 moothn o th im g 
n it i n 




Cfc,(r,s) Ck+i \3l,rn 9r,s\ 

wh pi 1 {l,m) n (r, s) mu t 4- onn t ((Z — r)^ + (m — s)^ = 1). t 

i y to th t Eij = 0 i th im g i lo lly fl t. 

n ig.3 th DST o th n in ig.2 i hown. ight pi 1 nt o 

high lo 1 i ul ymm t y. 




Fig. 3. Th DST o th n in ig.2. 



Th DST i ppli to th input im g D to omput th t n o m 
im g S. Th m n V lu n th V i n (75 1 o iv y it 

tim tion o th im g hi tog m. 

Th in i to {gs n ug) th n u to v lu t th 1 tion o th 
o int t y m n o th ul 



T{DST{D),ij.s,a)s,a) 



DST{D) i DST[D) > p .5 + a — us 
0 oth wi 



26 



ntonio hell et 1. 



wh Of — 0 (in ou p im nt a = 3). ig.4 how th ult o th 1 tion 
ul on th t n o m im g in ig.3. 




Fig. 4. oint o int tit in th n in ig.2. 



3 Object Detection 

Th OBD mo ul p o m th t tion o th zon ont ining n i t 

o j t o mpl in ig. th u ik’ u i t t . Th v lu in th 

t t zon th n u to omput nit 1 o j t ipto . 

Th g nt p opo o thi t k qui qu n o m im g y 

o ot m thi qu n p nt i nt vi w oun th o j t. h 

2 -vi w o th o j t i t t om v y m y u ing th n k g nt 

20 . 

Th Object Symmetry Transform [OST g nt) 6 i th n ppli on h 
2 -vi w. 



frame 

A 



Snake Agent 



2D-viev Oj 



Svmmetn' Agent 



OSTj^.(O) -j-OST(O) 



Fig. 5. j t t tion mo ul . 




oope tin St te y o 



je t e o nition 



269 



t mu t point out th t u ing thi ph th OBD i iv n 1 o y 

th n mo 1 knowl g th t i uilt y t king in ount th p i 

go 1 (in ou look t th o j t on th k ). 

3.1 Snakes and Segmentation 

nki om 1 uvtht mov in th im g un th influ n 
It to th lo 1 i t i ution oth g ylvl. hnth nk 
o j t ontou it i pt to it h p . n thi w y it i po i 1 to 
o j t h p o th im g vi w. n k n op n o lo ontou i 
in p m t i o m y 

v( )=( ( )y( )) 

wh ( ) y( ) y 0-0 in t long th ontou n s — 0, 1 i no - 
m liz 1 ngth. Th n k mo 116 n th n gy o ontou n m 
th n k n gy Esnake to 

E snake — Jq (Eint('f^(^)) T Einiage(y(^^)))^^ 

Th n gyint g li un tion 1 in it in p n nt v i 1 i un tion. 

Th int n 1 n gy Eint i o m om Tikhonov t iliz 20 n i 

n 



wh II 


i th u li n no m. Th 


t 0 


ontinuity t m w ight y 


( ) 


m k th ontou h v 1 ti 


lly whil t th 


on 0 u V tu t 


m 


w ight 


y ( ) m k it it: 


nt to 


n ing. 


0 mpl tting ( ) 


= 0 


t point 


How th n k to 


om 


on -0 


i ontinuou t point 


n 


V lop 


0 n . 










Th 


im g un tion 1 t min th 


tu 


whi h will h v low im g 


n gy 


n h n th tu th 


t tt 


t th 


ontou . n g n 1 16 


19 


thi un 


tion 1 m up 0 th t 


m 










Eimage — ‘^lineEnne “f" ^edgeE^dge 


“h 'dJtermEterm 





wh w not w ighting on t nt. h o w n o pon to lin 
g n t min tion p tiv ly. Th n k u in thi m wo k h only 

g un tion 1 whi h tt t th n k to point t high g i nt 

Eimage — E^dge — * E{x^ y)) 

Thi i th im g un tion 1 p opo y 16 . t i 1 g 

op to 1 th t in th lo u o tt tion o n gy minimum. i 

u i n o t n vi tion igm whi h ont ol th moothing p o p io 

to g op to .Minim o Sedge H on z o- o ing o D{x,y) whi h 

n g in M - il th th o y 17 . 1 P It ing i mploy 



o o 
h n 
t t th 
i 




ntonio hell et 1. 



2 0 

whi h How th n k to om into quili ium on h vily It im g 

n th n th 1 v 1 o It ing i u in ing th lo u o tt tion o 
minimum. 

Th impl m nt n k How to t t th o j t h p in impl w y n 
in ho t tim o v y m o th qu n thi gm nt tion m k po i 1 

to in ivi u t 2 -vi w o n o j t (o o j t ) O in th im g . n on 
th vi w g moving oun th v ti 1 i o O. v y 2 - m 

Oi o pon to n ngl o vi w giv n y (j)i = i — A(f> o 0 — i — m— 1{ 
ig.6 ). t will th n p o y th g nt OST th t omput ymm t y 

in i to o i nt i o ymm t y o th o j t. 




Fig. 6. Th OST tnom )2 - mviw ) omput tion o th i 1 
ymm t y in i to on giv n vi w. 



ig.7 how th ult o th ppli tion o th n k g nt to on vi w o 
th u ik’ u . 



3.2 Object Axial Symmetries and Features Extraction 

n o th m in p o 1 m i th omput tion 1 ompl ity 1 t to th h 

o th o ymm t y in g y 1 v 1 im g in t it n on i n 

optimiz tion polm. n yitil2 n optimiz tion p o u on 

g n ti Igo ithm i p opo . n thi pp o h u i n win ow i u to 

1 t th o int t wh to omput th ymm t y . Th tim 

ompl ity o th Igo ithm i 0{N — N) o n im g D o lin iz N. 

n6 nwgn Hz m u o il ymm t y MSg i n . t 

nition in th 1 pi n i on th oHowing in i to omput o 

giv n i r with i tion 9 n p ing t ough th y nt o th 2 - m 

It y th n k - g nt Oi 



MSeiOi) = l-Ag{Oi) 




oope tin St te y o 



je t e o nition 



2 1 




Fig. 7. nitojt t t yth nk gnt. 
wh 

, ^ fc “ Oi(p^)lh(p)dp 

(G-l)f^^h(p)dp 

wh Ci i th uppoto Oi. Th t m (G — 1) h(p)dp no m liz MSg{Oi) 

in th int v 1 0, 1 whil h > 0 i un tion o th i t n d o th point p 
om th i r. 

ot th t th ho oh p n on th influ n th t i giv n to th i - 

t n d tw n th pi 1 n th giv n i . mpl o h- un tion 

) h{p) = d 
) h{p) = l/d 

) ^(P) = 

) h{p) = 

Th Igo ithm to n nit o ymm t y n ily impl m nt 

in th i t . i t o 11 th y nt b o th o j t i t min 

th n th m u MSgiOi) ip o m o 0 = ^ with fc = 0, 1, n — 1. 
M im o MSg 1 o n i t to ymm t y o Oi. 

t i y to th t th omput tion tim o thi t k i 0{n — N'^). 

Th OST omput tion i p o m y omputing th i 1 ymm t y o 
i nt o j t vi w ( ig.6 ). t ollow th t 

OST{k,i){0) = MSk{0,) 

t tu n out th OST i n im g o im n ion n — mtht p nt to 

vi w o 3 -o j t. n th ollowing w will to it OST- p nt tion 
o O. 

ig. how th ult o th ppli tion o th OST to th u ik’ u . 



ntonio hell et 1. 



2 2 




Fig. 8. Th OST o th u ik’ u t t in ig.7. 

4 The Object Recognizer 

n th ollowing th 1 i g nt i i . n ou y i n 1 i- 

h n on i . L t ^i,P 2 , •••,PL-th to oj t p ototyp 

1 y P nt y th i OST. Th 1 i ign n unknown o j t x to 

th t 1 p nt y th lo t p ototyp . 

x-Clasi- p{x,pi)~mini k L-p{x,pk~ 

wh p i imil ity un tion. Th imil iti u in ou th normal- 

ized correlation {NC) (p u o-m t i ) th Euelidean {ED) n th Hamming 
{HD) i t n . 11 o th m n in th int v 1 0, 1 • 

ototyp g n t y ynth ti h p (p 11 log m ( ) on 

( ) u ( ) ylin ( ) py mi ( ) Hip oi ( L) ph ( ) 

to u (T ) n pot-lik ( L)) p nting k t h o real o j t on th k 

( u ik’ u ( ) p n-hol ( ) p p -w ight ( ) n mug ( M)). 

n T 1 2 th o pon n tw n p ototyp n 1 o j t i hown thi 
o pon to impli wo 1 mo 1. 

T 1 2 how th 1 i tion ult o th u ik’ u . ot th t 11 

th imil ity un tion ought to o t 1 i tion. 

5 Experiments and Discussion 

Th o j t- ognition ytmh ntt on 1 wo 1 n . ow v 
in thi p limin y p im nt tion th kin oojtinth nh n 




oope tin St te y o 



je t e o nition 



2 3 



Object 


Model 


RC 

PH 

PW 

BM 


cube 

cylinder parallelogram ellipsoid 
cone pyramid sphere 
pot — like 



Table 1. j t -mo 1 o pon n . 



P 


PA 


CO 


CU 


CY 


EL 


PY 


SP 


PL 


TO 


NC 


0.14 


0.40 


0.05 


0.20 


0.34 


0.25 


0.42 


0.23 


0.30 


ED 


0.21 


0.16 


0.05 


0.0 


0.12 


0.21 


0.1 


0.1 


0.1 


HD 


0.16 


0.12 


0.03 


0.04 


0.16 


0.1 


0.15 


0.12 


0.13 



Table 2. 1 i tion ult o th u ik’ u 



limit to tho in lu in th o j t t - . T 1 3 umm iz th 1 i- 

tion ult o t in u ing th imil ity un tion with vote t t gy th t 
omput th m n v lu o th th i t n un tion om th p ototyp . t 
i vi nt th t th m imum o imil ity h n h in o pon n 
with th mo 1. ot th t in th po t mpl on T 1 3 th p p -w ight 
i ph i 1 n th p n-hol i ylin i 1. 



Object 


PA 


CO 


CU 


CY 


EL 


PY 


SP 


PL 


TO 


RC 


0.1 


0.22 


0.04 


0.0 


0.13 


0.25 


0.19 


0.1 


0.19 


PH 


0.10 


0.1 


0.11 


0.03 


0.13 


0.21 


0.16 


0.11 


0.19 


PW 


0.26 


0.1 


0.15 


0.15 


0.19 


0.22 


0.08 


0.2 


0.23 


BM 


0.13 


0.21 


0.1 


0.14 


0.15 


0.30 


0.19 


0.08 


0.14 



Table 3. j t 1 i tion. 



n 1 p im nt tion llowing th p n o ny kin o o j t in th 
n houl p o m to t t th o u tn o ou pp o h. v th 1 
th mpl hown in thi p p quit li ti n oul goo t ting 
point o ont ol nvi onm nt y t m. 

Th y t m h n impl m nt in th i t i ut nvi onm nt. 

n oimg o iz 26 — 26 th tim to n lyz ingl 2 -vi w w 

o 3 sec ( om th qui ition o th m to th omput tion o th OST) n 
w msec, to p o m th 1 i tion. Th y t m i 1 to p o m real time 
omput tion in t th qui ition t i o out sec. ( thi tim in lu 

th o ot mov m nt oun th o j t). 

u th wo k will to on i vi w o th o j t i nt om 

o thogon 1 on . ototyp g n tion houl on i mo ophi tit o j t 

mo ling. 





ntonio hell et 1. 



2 4 

References 

1 . loimono . ei n n yop hy y ” live vi ion” in Int., Journal of Com- 
puter Vision ol.l o.4 pp. 333-356 19 . 

2 . . own ” ue in Sele te e eption” in Proc.ICPR’92 ( ompute 

So .) ol.l pp. 21-30 The ue 1992. 

3 . hell . i e u S. ho et. 1. ” S i t i ute hite tu e o n- 

telli ent S tern” Proc. CAMP-97 ( ompute So .) o ton 199 . 

4 . i e u . e i n .Te olo ” - hine- i ion e on 

n o m tion u ion” p o ee in o 93 (e . . . youmi L.S. vi 

K. . 1 V ni ) omp.So . e pp.42 -435 ew le n 1993. 

5 . i e u . lenti ” ete tion o e ion o inte e t vi the y mi i ete 

Symmet y T no m” in v n e in ompute i ion (Solin K op t h Klette 

n j y e ito ) Sp in e - e 1 199 . 

6 . i e u . nt V i ” new pp o h to e n ly i ” - -05/9 ni- 

ve ity o le mo 199 . 

u h n S. . ize ’’The inten ity xi o ymmet y ppli tion to im e 

e ment tion” IEEE Trans. PAMI ol.l5 . pp. 53- 0 1993. 

e i noti . . h u out t u ion o multi en o ete tion y tem 

IEEE Trans, on Pattern Information Theory 36(6) pp. 1265 12 9 1990. 

9 . om ” onfi u tion o i t i ute hete o eneou in o m tion y tem ” o- 

ee in Se on nte n tion 1 ok hop on onfi u le i t i ute Sy tem 

( t. 0.94T 0651-0). itt u h S pp. 210. . ne ie ellon 

niv. 21-23 h 1994 omput. So . e . Lo Imito S 1994. 

10 . .Kelly n . .Levine ” om ymmet y to ep e ent tion” Teehnieal Re- 
port T - -94-12 ente o ntelli ent hine . ill nive ity ont e 1 

n 1994. 

11 .Khole n . 11 h ” i u 1 te -e e t n inve ti tion o vi u 1 p o- 

e e ” Proe. Amer. phil. Soe. ol. 269-35 1944. 

12 Ki i ti ” ete tin ymmet y in y level im e the lo 1 optimiz tion p- 

p o h” 1996. 

13 . ei el . ol on . e hu un ” ontext ee ttention 1 pe to the 
ene lize Symmet y T n o m” Int. Journal of Computer Vision ol.l4 119- 

130 1995. 

14 .K.T ot o ’’The omple ity o pe eptu 1 e h t k ” in Proc., IJCAI 15 1- 

15 19 9. 

15 .T ei m n ” e ttentive p o e in in vi ion” Computer Vision, Graphics, and 

Image Proeessing ol.31 156-1 19 5. 

16 . K . itkin . Te zoupolo ”Sn ke tive ontou o - 

el ” International Journal of Computer Vision ol.l 321-331 19 

1 . . il eth ”Theo yo e ete tion” Proe. R. Soe. Lond. B. ol.20 

1 -21 19 0. 

1 . To e . o io ” n e ete tion” IEEE Transaction on Pattern Analysis 

and Maehine Intelligenee ol. (2) 14 -163 19 6. 

19 K. .Li . T. hin ” e o m le ontou - o ellin n xt tion” IEEE 

Transaetion on Pattern Analysis and Maehine Intelligence ol.l (11) 10 4-1090 
1995 

20 . Te zopoulo ” e ul i tion o nve e i u 1 o lem involvin i onti- 

nuite ” IEEE Transaction on Pattern Analysis and Maehine Intelligenee ol. (4) 
413-424 19 6 




Model Selection for Two View Geometry 

A Review 



hili orr 



Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA, 
philtorrOmicrosof t . com, 

http : //www. research. Microsoft . com/research/vision/ 



Abstract. Computer vision often involves the estimation of models of 
the world from visual input. Sometimes it is possible to fit several dif- 
ferent models or hypotheses to a set of data, the choice of exactly which 
model is usually left to the vision practitioner. This paper explores ways 
of automating the model selection process, with specific emphasis on the 
least squares problem, and the handling of implicit or nuisance parame- 
ters (which in this case equate to 3D structure). The statistical literature 
is reviewed and it will become apparent that although no one method 
has yet been developed that will be generally useful for all computer vi- 
sion problems, there do exist some useful partial solutions. This paper is 
intended as a pragmatic beginner’s guide to model selection, highlighting 
the pertinent problems and illustrating them using two view geometry 
determination. 



1 Introduction 

o oti vision h s its sis in g om tri mo ling of th worl , n m ny vision 
Igorithms tt m t to stim t th s g om tri mo Is from r iv t 
su lly only on mo 1 is fitt to th t ut wh t if th t might h v 

ris n from on of s v r 1 ossi 1 mo Is? n this s , th fitting ro ur 

n s to ount for 11 th ot nti 1 mo Is n s 1 t whi h of th s fits th 

t st his is th t sk of ro ust mo 1 s 1 tion whi h, in s it of th m ny 

r nt V lo m nts in th li tion of ro ust fitting m tho s within th fi 1 
of om ut r vision, h s n, y om rison, quit n gl t 

his r r vi ws urr nt st tisti Im tho sin mo Is 1 tion with r s t 
to t rmining th two vi w g om tri r 1 tions from th oint m t h s tw n 

two im g s of s n , g th fun m nt 1 m trix 13, 19 h s r 1 tions 

n us to gui m t hing 52, 60 n th n stim t stru tur 4 or 

s gm nt tion 51 hr r s v r 1 two vi w r 1 tions th t oul s ri n 
im g ir n it is n ss ry to stim t th ty of mo 1 s w 11 s th 
r m t rs of th mo 1 

h r is 1 i out s follows tion 2 s ri s four two vi w motion 

mo Is s w 11 s th ir sso it gr s of fr om tion 3 s ri s th 
m ximum lik lihoo m tho for stim tion h us of just m ximum lik li- 
hoo stim tion will Iw ys 1 to th most g n r 1 mo 1 ing sit s 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 277-301, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




278 



Philip H.S. Torr 



most lik ly hr for s tion 4 intro u s th lik lihoo r tio t st for om r- 

ing two mo Is Ithough not g n r lly us ful for om rison mongst multi 1 

mo Is, its X osition will rovi insight into th f ilings of som of th s oring 
rit ri x 1 in 1 t r tion 5 s ri s th AIC rit rion n som v ri nts 

of it, with s ifi m h sis on th 1 st squ r s fitting ro 1 m t is foun th t 

th AIC onsist ntly ov r stim t s th num r of r m t rs in th mo 1, th 

r son for this is x 1 in n som mo ifi tions r sugg st to om n- 

s t n tion 6 th y si n ro h is t il ; th m in ro 1 m with th 

y si n ro h is th s ifi tion of th riors, n th BIC ( y si n infor- 

m tion rit rion roxim tion is is uss n s tion 7 minimum s ri tion 

1 ngth i s for rovi ing riors r only ri fly tou h on s th y 1 to th 

s m sort of Igorithms s th y si n ro h in lly th n fits of mo 1 
V r ging r outlin in s tion 8 suits r s r throughout th t xt n 
summ riz in s tion 9 h is ussion in s tion 10 ov rs som ommonly 

sk qu stions out mo 1 s 1 tion in om ut r vision, n in th on lusion 

it will s n th t this is f r from solv ro 1 m 

Notation: 3 s n oint roj ts to x, n x = x^ in th first n 

s on im g s, wh r x = (xi X 2 X 3 is homog n ous thr v tor h 

inhomog n ous oor in t of n im g oint is ( =(12 h orr s on- 

n will Iso r r s nt y th v tor m = (x y x y , th s t of 11 

oint m t h s tw n two vi ws will not y Nois fr (tru t 
will not y n un rs or _, stim t s , noisy (i m sur t s 

h ro ility nsity fun tion ( f of giv n is r( is two 

vi w r 1 tion, n r th r m t rs of th t r 1 tion 



2 Putative Motion Models 

his s tion s ri s th ut tiv motion mo Is whi h n onstr in th rigi 

motion of oints tw n two vi ws h motion mo 1 is relation 

s ri y s t of parameters whi h fin on or mor implieit funetional 

relationships g(m = 0 (0 is th z ro v tor tw n th im g oor in t s 
i gi{ ; = 0 

hr r four ty s of r 1 tions s ri in this s tion irstly, th r - 

1 tions n ivi tw n motions for whi h m r osition n stru tur 

n r ov r — 3 r 1 tions; n motions for whi h it nnot; for inst n 

wh n 11 th oints li on 1 n , or th m r rot t s out its o ti ntr — 

2 r 1 tions s on ivision is tw n roj tiv n orthogr hi ( ffin 

vi wing on itions ( mor om 1 t t xonmy is giv n in 57 

u os th t th vi w f tur s ris from 3 o j t whi h h s un rgon 

rot tion n non-z ro tr nsl tion ft r th motion, th s t of homog n ous 

im g oints x^ i = 1 n is tr nsform to th s t x^ h two s ts of 

f tur s r r 1 t y th im li it fun tion 1 r 1 tionshi x^ Fxj = 0 wh r F is 

th r nk2,3 3 13, 19 fun m nt Im trix, thisisr 1 tion 1 h fun m nt 1 

m trix n sul t s th i ol r g om try t ont ins 11 th inform tion on 




Model Selection for Two View Geometry: A Review 



279 



m r motion n int rn 1 m r r m t rs v il 1 from im g f tur 
orr s on n s Ion 

h n th r is g n r y in th t su h th t uniqu solution for F 
nnot tt in it is sir 1 to us sim 1 r motion mo 1 or inst n 

wh n th m r is only rot ting out its o ti ntr hr oth r mo Is r 

onsi r 2, whi h is th fhn m r mo 1 of un y iss rm n 38 

with lin r fun m nt 1 m trix Fa h fhn m r is li 1 wh n th 
t is vi w un r orthogr hi on itions n giv s ris to fun m nt 1 
m trix with z ro s in th u r 2 y 2 su m trix ^ h homogr hy x = Hx 

is r 1 tion 3, n fhnity x = Hax is r 1 tion 4 whi h ris wh n th 

vi w oints 11 li on 1 n or th m r is rot ting out its o ti ntr 

tw n im g s, th homogr hy ing in th roj tiv s , th fhnity in th 

orthogr hi 



Relation, 


c 


k 


d 


Constraint 


Parameters 


n r 


7 


7 


3 


x ^Fx = 0 


/i h fs 
F= / 4 / 5 /e 
/r fs h 


n Fa 


4 


4 


3 


x ^Fax = 0 


0 0 gi 

Fa = 0 0 Q 2 

93 94 , 9 s 


0 0 r p 


4 


8 


2 


x = Hx 


hi /i2 hs 
H = hi h^ he 
hr hs hg 


nit 


3 


6 


2 


x = Hax 


Ol 02 03 
Ha = 04 05 06 

0 0 07 



Table 1 . A description of the reduced models that are fitted to degenerate sets 
of correspondences. is the minimum number of correspondences needed in a 
sample to estimate the constraint, k is the number of parameters in the relation; 
is the dimension of the constraint. 



Model Complexity: ollowing th m xim of m “ ntiti s r not to 

multi li without n ssity” ^ mo 1 s 1 tion ty i lly s or s mo Is y 
ost fun tion th t n liz s mo Is with mor r m t rs t is onv ni nt 

^ Actually Fa occurs in the non-orthographic case when the optical planes of the 
two cameras coincide [ 50 ]. Triggs claims (personal communication) that affine recon- 
struction in this case gives projectively correct results. 

^ In fact Occam did not actually say this, but said something which has much the 
same effect, namely: ‘It is vain to do with more what can be done with fewer’. That 
is to say, if everything in some science (here computer vision) can be interpreted 
without assuming this or that entity, there is no ground for assuming it [ 44 ]. 




280 Philip H.S. Torr 



t this oint to intro u th num r of x li it gr s of fr om for th 

r m t rs of h mo 1 h fun m nt 1 m trix h s 7 gr s of fr om, 

th homogr hy h s 8, n y t th fun m nt 1 m trix is mor g n r 1 h 

fhn fun m nt 1 m trix h s 4 gr s of fr om, fhnity 6, g in th ffin 
fun m nt 1 m trix is mor g n r 1; this s ming r ox n r solv y 
onsi ring th im nsion of th mo 1 

n ition to th gr s of fr om in th r m t rs w sh 11 s th t th 

om 1 xity of mo 1 is Iso t rmin y its im nsion, whi h is fin now 

h ir of orr s on ing oints x, x fin s singl oint m in m sur - 

m nt s , form y joining th oor in t s in h im g h s im g 

orr s on ns, whi h r in u y rigi motion, h v n sso it Ig- 
r i V ri ty in h fun m nt 1 m trix, n ffin fun m nt 1 m trix 

for two im g s r im nsion 3 v ri ti s of gr 4 (qu rti n 1 (lin r 

r s tiv ly h homogr hy n ffinity tw n two im g s r im nsion 2 

V ri ti s of gr 2 (qu r ti n 1 (lin r r s tiv ly h ro rti s of 

th r 1 tions r summ riz in t 11 his loos ly s king m ns th t 

th fun m nt 1 m trix s ri s thr im nsion 1 surf in n th 

homogr hy two im nsion 1 surf h oint on this fun m nt 1 m trix 

surf ( ffin or g n r 1 is sim ly in t h tw n two of th im g s, n it 

h s thr gr s of fr om quiv 1 nt to th f t th t it m s to thr i- 

m nsion 1 oint in th s n imil rly h oint on homogr hy (or ffinity 

surf r r s nts m t h with two gr s of fr om 

3 Maximum Likelihood Estimation 

ithin this s tion th m ximum lik lihoo stim t of th r m t rs of 
giv n r 1 tion is non-rigorously riv Ithough this is st n r r suit 

th riv tion will r v 1 th t th r r mor r m t rs to onsi r in 

th mo 1 formul tion th n just th x li it r m t rs of giv n in th 1 st 

s tion n 1 1 h s ition 1 r m t rs r som tim s r f rr to s 

nuis n r m t rs 47 his is im ort nt s 1 t r it will s n th t th 
rior istri ution of is r 1 t to th num r of r m t rs th t n to 

stim t urth rmor riving th m ximum lik lihoo rror from first 

rin i 1 s is us ful x r is s th r is long history of r s r h rs using ad 

hoc rror m sur s to stim t multi 1 vi w r 1 tions whi h r su o tim 1 

L t m V tor of th os rv oor in t s m = (x y x y t is 

ssum th t th nois on m is ussi n m = m + e with ov ri n m - 

trix h ov ri n m trix for s t of orr s on n s, r r s nt y 

st k V tor = (iri]^ , is ^ = i g ( n this r it is 

ssum th t th nois in th lo tion of f tur s in 11 th im g s is ussi n on 
h im g oor in t with z ro m n n uniform st n r vi tion , thus 
= I ^ ( xt nsion to th mor g n r 1 s is not iffi ult n is s ri y 

K n t ni 26 11 th t th tru v lu m of m s tisfi s th im li it fun - 

tion 1 r 1 tionshi s ( g Ig r i olynomi Is =1 for im nsion 3 v ri ti s, 

n =2 for im nsion 2 v ri ti s in two im g s 




Model Selection for Two View Geometry: A Review 281 



^^(m; =0 i = 1 q (1 

iv n ^ (th 1 st ing th stim t v ri n of th nois , whi h ty - 

i lly n in n ntly riv from th ro rti s of th f tur m t h r 

th ro ility nsity fun tion of s t of o s rv orr s on n s is 

27T " 



wh r g( ; =0 o fin th m ximum lik lihoo solution, th n g tiv log- 

lik lihoo 

- =-log r( - - ^ (3 

is minimiz su j t to th r stri tions of g o om lish this L gr ng mul- 
ti li rs r us , th riv tiv s of 

- =2ulog ^ + ' + ^g("r (4 

with r s t to^ ^28 r qu t to z ro h s qu tions r 47 

nl 1 / ^ / 

2 “ “ “ 2 4 ( ~ ( “ “0 

— =“ +S =0 



= =0 

2 

§(■'; =0 

wh r S n T r th o i ns of th im li it fun tion 1 r 1 tionshi s g( ; 
with r s t to n r s tiv ly S = > T = h 

solution of this s t of qu tions r th m ximum lik lihoo stim t s of th 
nois , th r m t rs for th r 1 tion, n th st fitting orr s on n s 
ssuming is giv n, th num r of fr r m t rs in this syst m is th 
num r of gr s of fr om k in giv n in 11, lus num r of gr s 
of fr om in h orr s on n m o ys th onstr ints giv n y g n 

h n li s in th V ri ty fin y — ; su h th t it is th 1 st squ r s ist n in 
— “ w y from th o s rv m t h m, thus h orr s on n m h s gr s 
of fr om s giv n y 11; thus th num r of gr s of fr om in is 
n (th xtr num r of nuis n r m t rs — Ithough in this ro 1 m th s 

r m t rs r f r from ing nuis n s th y im li itly fin th stim t 

stru tur of th s n The total number of parameters to be estimated (excluding 
) is p = k + n , n th tot 1 num r of o s rv tions is = 4n, oth of whi h 

r im ort nt for th riv tion of onfi n int rv Is in th n xt s tion 

® At this juncture the selection of the functional form of the data — the most appro- 
priate motion model is assumed known. 




282 Philip H.S. Torr 



iv n th t = I th n th n g tiv log lik lihoo (3 of 11 th orr s on- 

n s iHi, i = 1 n wh r n is th num r of orr s on ns, is 



-J J \ 


2 




2 




xl -xl j 




— 




m — m 


u ~ y 






i 






(5 



wh r th log lik lihoo of giv nmthis (m — =(^(^> is ounting th 
onst nt t rms (whi h is quiv 1 nt to th r roj tion rror of rtl y n 
turm 20 f th ty of r 1 tion — is known th n, o s rving th t , w n 
stim t th r m t rs of — to minimiz this log lik lihoo his inf r n is 
11 ‘ ximum Lik lihoo stim tion’ ( ish r 1936 14 Num ri 1 m tho s 

for fin ing th s two vi w r 1 tions r giv n in 5, 53, 56 o nfor th on- 
str ints on th r m t rs su h s tF— = 0 s qu nti 1 qu r ti rogr mming 

( 16 is us , st t of th rt m tho for solving onstr in minimiz - 

tion ro 1 ms, whi h h s n foun to out rform m ny oth r m tho s in 
t rms of ffi i n y, ur y, n r nt g of su ssful solutions, ov r 1 rg 
num r of t st ro 1 ms 45 

Robust Estimation h ov riv tion ssum s th t th rrors r us- 

si n, oft n how v r f tur s r mism t h n th rror on m is not us- 
si n hus th rror is mo 1 s mixtur mo 1 of ussi n n uniform 
istri ution - 

, ( 1 . ^ , If 

r( = W- X (-— + (1 -7 (6 

wh r 7 is th mixing r m t r n is just onst nt, is th st n r 
vi tion of th rror on h oor in t o orr tly t rmin 7 n nt ils 
som knowl g of th outli r istri ution; hr it is ssum -without a priori 
knowl g -th t th outli r istri ution is uniform, with — + ing th ix 1 

r ng within whi h outli rs r x t to f 11 (for f tur m t hing this is i - 

t t y th siz of th s r h win ow for m t h s su lly th mo 1 s 1 tion 

m tho s r riv un r ussi n ssum tions, ut this ssum tion is not 
n ss ry, n th m tho s r qu lly v li using th ro ust fun tion for ro - 

ilitygiv n ov ( g s on h tti in 17 , or 41 hus in 11th t follows th 

ro ust lik lihoo is us th r th n minimiz (6 , it is oft n om ut tion lly 

mor sim 1 to minimiz ro ust fun tion 23 of th form 



( 



2 if 2 < 3 

3 if 3 



(7 



wh r is th num r of gr s of fr om in (2 for H, 1 for F , n 3=4 

h thr shol 3 = 4 orr s on s to th 95% onfi n 1 v 1 his m ns th t 

n inli r will only in orr tly r j t 5% of th tim 

his form of th fun tion hssvrl vntgs irstly it rovi s 1 r 

i hotomy tw n inli rs n outli rs on ly outli rs to giv n mo 1 r 

giv n fix ost, r fl ting th t th y ro ly ris from i us or uniform 




Model Selection for Two View Geometry: A Review 283 



istri ution, th log lik lihoo of whi h is onst nt, wh r s inli rs onform 
to ussi n mo 1 urth rmor if th outli rs follow sufR i ntly i us 
uniform istri ution th n th y will only in orr tly 11 gg s inli rs sm 11 
r nt g of th tim (f Is ositiv s 

h ro ust ost fun tion (7 Hows th minimiz tion to on u t on 11 

orr s on n s wh th r th y r outli rs or inli rs s of th ro ust fun tion 
limits th ts of outli rs on th minimiz tion y i lly, s th minimiz tion 

rogr ss s m ny outli rs r r sign t inli rs 

Problems with MLE: f th ty of r 1 tion — is unknown th n m ximum 
lik lihoo stim tion nnot us to i th form of — s th most g n r 1 

mo 1 will Iw ys most lik ly i h v low st— n 12 th vrg sum 
of squ r of this rror ( r shown for 100 s ts of 100 synth ti m t h s 

h mthswr gnrt to onsist nt with r n om F, Fa, H or Ha 
onstr ints, i with g n r 1 motion, orthogr hi roj tion, m r rot tion, 
or orthogr hi 1 n with ussi n nois =1 to th roj t m t h 

oor in t s h mo 1 is stim t for h t s t n th m n of th 
r or t n s n th t just i king th mo 1 with low st will in 
g n r 1 Iw ys 1 to hoosing th most g n r 1 mo 1 — F, thus th n for 

mor so histi t mo 1 s 1 tion m tho ish r w s w r of th limit tions 

of m ximum lik lihoo stim tion n mits th ossi ility of wi r form of 

in u tiv rgum nt th t woul t rmin th fun tion 1 form of th t ( ish r 

1936 250 14 ; ut th n go s on to st t ‘ t r s nt it is only im ort nt to 

m k 1 r th t no su h th ory h s n st lish ’ 



Estimated 




Point Motion 






General 


Orthographic Rotation 


Affinity 




F 


Fa 


H 


Fa 


Fundamental F 


93.074 (93) 87.037 


80.6162 


78.378 


Affine F a 


978.350 


96.448 (96) 


806.389 


85.875 


Homography H 


4986.881 


4834.735 


193.964 (192) 189.132 


Affinity F a 


4993.045 


4967.894 


1023.118 


191.643 (194) 



Table 2. Mean SSE for 100 matches over 100 trials. Varianee of noise on the 
coordinates: ^ = 1. Braeketted values are the expeeted value if the model is 

correct. 



4 Model Selection — Hypothesis Testing 

n w y of mo 1 om rison is vi hy oth sis t sting, this ro ur t sts 
th null hy oth sis th t on r 1 tion — i s ri s th t y om ring it to 

n It rn t hy oth sis th t th r 1 tion —2 sristh t hrl tions 

must nested so th t th r m t rs of th mor g n r 1 mo 1 1 in lu 11 

th r m t rs of th 1 ss g n r 1 mo 1 2 in ition to som xtr on s 





284 



Philip H.S. Torr 



ommon w y to o this is y th lik lihoo r tio t st ( g s 31, 35 o o 
this th L of th r m t rs i n 2 for oth r 1 tions must r ov r 
h n th t st st tisti 



( = 21 og ^^5 2 (8 

is X min , wh r — 1 h s mor r m t rs th n — 2 , whi h sym toti lly ^ 
follows ^ istri ution with pi — p 2 gr s of fr om, wh r pi is th tot 1 
num r of r m t rs in mo I i f ( is 1 ss th n som thr shol ( t rmin 
y 1 V 1 of signifi n a th n mo 1 2 is t oth rwis mo 1 2 is 

r j t , i th t st is 



( = 2( 1 - 2 < Pi -P2 (9 

f — 2 hoi s th n th h n of ov rfitting is th us r s ifi a f — 1 hoi s th n 

th h n of un rfitting is n unknown /3, n th qu ntity 1 — /3 r f rr to 

s th power of th t st h ow r of th t st is o viously r 1 t to th hoi 

of a n th istri ution of th t or inst n if 11 th m t h s h v sm 11 

is riti s th n th ow r of th t st for giv n a is lik ly to mu h low r th n 

if th is riti s r high r lly a shoul hos n so th t th h n of 
ov rfitting (a n un rfitting (/3 r sm 11 (i th ow r of th t st is high 
n th N ym n- rson th ory of st tisti 1 hy oth sis t sting only th ro - 

iliti s of r j ting n ting th orr t n in orr t hy oth s s, r s 

tiv ly, r onsi r to fin th ost of ision h ro 1 m with this 

ro h is th t it is iffi ult to t to situ tion wh r s v r 1 mo Is might 
ro ri t , s th t st ro ur for multi 1 - ision ro 1 m involv s 
ifh ult hoi of num r of n nt signifi n 1 v Is his sugg sts 

i r nt ro h in whi h s oring m h nism is us to r nk h mo 1 s 

s n, m ximum lik lihoo m tho s will Iw ys 1 to th most g n r 1 mo 1 

ing sit ; h n th n for mor g n r 1 m tho of in u tiv inf r n 

th t t k s into ount th om 1 xity of th mo 1 his h s 1 to th v 1- 

o m nt of V rious information criteria (s th s i 1 issu of sy horn trik on 

inform tion rit ri , ol 52, No 3 or most mongst th s is ‘ n inform tion 

rit rion’ (AIC) ( k ik 1974 , s ri n xt 

5 AIC for Model Selection 

k ik ’s inform tion rit rion is us ful st tisti for mo 1 i ntifi tion n 
V lu tion k ik (1974 w s rh s th first to 1 y th foun tions of inform - 
tion th or ti mo 1 v lu tion v lo mo 1 s 1 tion ro ur -for 

us in uto-r gr ssiv mo ling of tim s ri s-th t hos th mo 1 with mini- 
mum X t r i tion rror for futur o s rv tions s th st fitting h 

meaning as the number of observations tends to infinity 




Model Selection for Two View Geometry: A Review 285 



ro ur s 1 ts th mo 1 th t minimiz s x t rror of n w o s rv tions 

with th s m istri ution s th on s us for fitting ^ t h s th form 

AIC = -2 +2p (10 

wh r p is th num r of r m t rs in th hos n mo 1, n is th log 

lik lihoo t n s n th t AIC h s two t rms, th first orr s on ing to th 

n ss of fit, til s on n Ity on th om 1 xity of th mo 1 h n th r 
r s V r 1 om ting mo Is, th r m t rs within th mo Is r stim t 

y m ximum lik lihoo n th AIC s or s om r to fin th mo 1 with th 

minimum v lu of AIC his ro ur is 11 th minimum AIC ro ur , 

n th mo 1 with th minimum AIC is 11 th minimum AIC stim t 

(MAICE whi h is hos n s th st mo 1 h r for th st mo 1 is th 

on with high st inform tion ont nt ut 1 st om 1 xity n v nt g of th 
AIC is its sim li ity, s it os not r quir r f r n to look u t 1 s, it is 

V ry sy to 1 ul t AIC on th m ximum lik lihoo stim t of th mo 1 

r m t rs is m urth rmor , k ik 1 ims th t th r is no ro 1 m of 

s ifying n r itr ry signifi n 1 v 1 t whi h mo Is shoul t 1 , 

n om rison tw n two mo Is n not n st or or r 

h AIC is V lo from th i th t th st mo 1 is th t whi h mini- 
miz s th X t SSE for futur t onsi r th s of fitting v ri ty of 

im nsion to im nsion 1 oints (r 11 our finition of r 1 tions in t rms of 

V ri ti s in s tion 2 , in this s th o im nsion is — K n t ni 26 oints 

out th t th AIC in this s is 

= -2 + 2 ( n + fc (11 

K n t ni’s riv tion of th AIC for 1 st squ r s is r th r r wn out, n th 
int r st r r is r f rr to his ook 26 nft kik 3gv simil r 

form for th AIC in th s of f tor n lysis wh n fitting mo Is of i ring 
im nsions th r th n r s nt it h r n intuitiv int r r t tion is r s nt 

in th n xt s tion; in th two im nsion 1 s =2, fitting lin mo 1 = 1 
n oint mo 1 =0 

Intuitive Interpretation onsi r qu tion (11 , th first t rm is th usu 1 
sum of squ r s of r si u Is, ivi y th ir v ri n s, r r s ntingth goo n ss 

of fit h n xt two t rms r r s nt th rsimony of th mo 1 h s on 

ing n Ity t rm for th im nsion lity of th mo 1, th gr t r th im nsion 

for th mo 1 th gr t r th n Ity h 1 st t rm is th usu 1 AIC rit rion 
of ing th num r of r m t rs of th mo 1, to n liz mo Is with mor 
r m t rs 

his is now illustr t y sim 1 xml, onsi r th two im nsion 1 

xml shown in figur 1 u os oints r g n r t from fix lo tion 

® He later demonstrated that AIC was an estimate of the expected entropy (Kullback- 
Leibler information) of the fitted distribution for the observed sample against their 
true one, showing that the model with the minimum AIC score also minimized the 
expected entropy, thus providing one way of generalizing MLE. 




286 Philip H.S. Torr 




Fig. 1. Showing the relationship between the noisy points, the optimally estimated 
line and the eentroid of the noisy points. For Gaussian noise the optimally esti- 
mated line is that whieh minimizes the sum of squares of perpendicular distances, 
and consequently it passes through the centroid of the data. For each point the 
distance to the centroid may be broken up into two components one parallel and 
one perpendicular to the line. 



with m n z ro, unit st n r vi tion, ussi n nois in oth th 

n oor in t s f oint n lin r fitt s r t ly y minimizing th 
sum of squ r u li n ist n s, th o tim lly fitt oint (th ntroi 
will li on th o tim lly fitt lin 40 L t th sum of squ r ist n s of 
th oints to th lin mo 1 ? n th sum of squ r ist n s of th 

oints to th oint mo 1 I] p, th n ^ p = ^ ^ + ^ |, wh r X] | is 

th ‘ r 11 r sum of squ r ist n s s shown for on oint in igur 1 t n 

s n th t uni ss th t 11 li x tly on oint th n X] f is Iw ys 1 ss 

th n ^ p h AIC for th lin mo 1 om ns t s for this i s y th n Ity 

t rm, whi h is twi th x t tion of th ‘ r 11 1’ sum of squ r s (X( | f 

th mo 1 stim t is lin th n th AIC h s th form 

AIC(lin =( ^ + 2n + 4 (12 

s th mo 1 h s im nsion on , o im nsion on n two gr s of fr om 

in th r m t rs f th num r of th t is 1 rg th gr of fr om of 

th mo 1 (i , th num r of th r m t rs h s littl t us it is 

sim 1 oust nt h t m tt rs is twi th im nsion of th mo 1, whi h is 

multi li y th num r of th t h im nsion qu Is th ‘int rn 1’ gr 

of fr om of th t , whi h in turn qu Is th x t tion of th ‘ r 11 1’ (or 

in ir tion on th m nifol sum of squ r s r turn turning to th 
X m 1 th GIC for oint is 

AIC(oint=( f+( ^ + 4 (13 

thus oint is f vour if AIC( oint — AIC (lin i X] I ~ n th 

Igorithm is quiv 1 nt to t st of s r long th lin 

Test results using AIC. onsi r th v r g SSE giv n in t 1 2 for 100 
t oints hs n turn into AIC y th ition of 614, 608, 416, n 




Model Selection for Two View Geometry: A Review 287 

412 — 2(n +k — for F, Fa, H n Ha r s tiv ly, t ul t in 1 5 t n 
s n th t on v r g th low st AIC qu t s to th orr t mo 1, Ithough 
it h V s 1 ss w 11 for istinguishing F from Fa, th n F from H n r lly 
th AIC t n s to un r stim t th im nsion of th t n ov r stim t 
th num r of motion mo 1 r m t rs hr son for this n s n in th 
ont xt of th lik lihoo t st x 1 in in th 1 st s tion onsi r using AIC 
to om r two mo Is — i n — 2 su h th t th mo 1 with low st AIC is 
t i mo 1 2 is t if AIC 2 — AICi < 0 or if 

2( 1 - 2 <2{pi-p2 (14 

whi h n s n to ir tly quiv 1 nt to (9 ollowing this lin of thought 
th signifi n 1 v 1 of th AIC rit rion is giv n y th signifi n 1 v 1 of th 
^ istri ution with — P 2 — gr soffr om n riti Iv lu 2 ^i—p 2 — or 



-P 1 -P 2 -- 1 2 3 4 5 6 8 10 12 14 20 

a 0.156 0.135 0.111 0.091 0.074 0.061 0.042 0.029 0.020 0.014 0.005 



Table 3. Calculates values for significance level a given a ^ with — p 2 ~ 
degrees of freedom and critical value/threshold 2^i — p 2 ~ 



Ti“T 2 — =2,th i r n in th num r of r m t rs tw n H n Ha, this 

I s to a = 0 135 or 13 5 r nt h n of ov rlitting, for (pi — p 2 =3, th 

i r n in num r of r m t rs tw n F n Fa, this 1 s to a = 0 11 or 

II r nt h n of ov rfitting, som ty i lly v lu s r giv n in t 1 3 t n 

s n th t this orn out x rim nt lly s rutinizing 15 s {pi — P 2 

in r s s th V lu of Q om s sm 11 r (1 ss th n 0 005 for (pi — p 2 =20 



Model Selected 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H F 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


707.074 701.037 694.6162 692.378 

1586.350 704.448 1414.389 683.875 

5402.881 5240.735 609.964 605.132 

5405.045 5379.894 1435.118 603.643 



Table 4. Mean AIC for 100 matches over 100 trials. It can be seen that the 
chance of overfitting dimension is small relative to the chance of overfitting the 
degree; i.e. average AIC for F lower than for Fa. 



11 th t th num r of r m t rs within h mo 1 h s two om on nts 

p = k + n , th first k is th num r of r m t rs in th r 1 tion, th s on n 






288 Philip H.S. Torr 



Estimated 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H Fa 


Fundamental F 
AfHne F a 

Homography H 
Affinity F a 


99 11 00 

1 88 0 0 

00 98 15 

0 0 2 85 



Table 5. Number of times each model selected over 100 trials, using AIC for 
each of the four motion types. It can be seen that AIC tends to overfit the degree 
of the model. 



is num r of r m t rs ro ortion 1 to th qu ntity of t his sugg sts 

th t mor g n r 1 form, th g om tri ro ust inform tion rit ri GRIC 

GRIG = —2 + 1 n + 2 k (15 

might ro ri t , with in 2 hos n to r u misfits ssu s in 

t rmining v lu s for th s r m t rs, n sugg st v In s for th m r now 
is uss igh r V lu s of 1 n 2 r s a ut in r s /?, r sing th 

power of th t st; thus th two r m t rs shoul hos n with n y to min- 

imizing a n P or th stim tion of two vi w g om try from f tur m t h s 
1=2 n 2 = 4hv rovi goo r suits, ov r wi r ng of on itions 

h first r m t r 1 influ n s th ision s to wh th r th th r 1 tion 

shoul im nsion 3 or 2, for n — 20, o; < 0 005 whi h m y onsi r 

t ly low hus th r is littl h n of ov r fitting th im nsion s this 

s num r of r m t rs qu 1 to th num rofm t h s, usu lly suffi i ntly 

high to gu r nt low a Now onsi r om osing g n r 1 motion into 1 n 

lus r 11 X motions 24 , th siz of P n s on th mount of r 11 x 
tting 1 = 2 m ns th t th mount of r 11 x n s to on v r g 
gr t r th n 2 0 ix Is to i ntify non homogr hy r 1 tion 

11 th t th r is high ro ility of ov rfitting th gr of th r 1 tion 

for two mo Is of th s m im nsion orths on rmtr 2 = 40 nsur s 

th t for (pi — p 2 = 1 o = 0 0456 whi h r v nts th t n n y to ov rfit th 

gr of th r 1 tion whilst not signifi ntly ting th ow r of th t st 

(whi h is omin t y th hoi of 1 for Irg tsts h vrg GRIC 

is giv n in 1 6 n th mo Is s 1 t in 1 7, it n s n th t th 
GRIC out rforms th st n r AIC 

AIC Variants: h f t th t th AIC t n s to ov rfit is g n r lly r - 

ogniz in th lit r tur oz og n 7 tt m ts to riv m sur s th t r 

sym toti lly onsist nt 

CAIC = -2 +|(log( +1 

CAICF = -2 +|(log( +2 +logJ- 





Model Selection for Two View Geometry: A Review 289 



Model Selected 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H F 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


721.074 715.037 708.6162 706.378 

1592.350 712.448 1422.389 691.875 

5410.881 5248.735 617.964 613.132 

5411.045 5385.894 1441.118 609.643 



Table 6. Mean GRIG for 100 matches over 100 trials. 



Estimated 


Point Motion 

General Orthographic Rotation Affinity 

F Fa H Fa 


Fundamental F 
Affine F a 

Homography H 
Affinity F a 


99 1 0 0 

1 98 0 0 

0 0 98 3 

0 1 2 97 



Table 7. Number of times each model selected over 100 trials, using GRIG for 
each of the four motion types. 



wh r J is th inform tion m trix of th stim t r m t rs nfortun t ly 
oth of th s m sur st n to hroni lly un rfit, th yh v n x t simil rity 
to th BIG roxim tion to th y s f tors is uss in th n xt s tion 



6 Bayes Factors 



ithin this s tion th y si n ro h to mo 1 om rison is intro u 

u os th t th s t of m t h s is to us to t rmin tw n om- 
ting motion mo Is with r 1 tions— i — with r m t rv tors i k 

h n y y s’ th or m, th ost rior ro ility th t — fc isth orr tr 1 tion 



is 



r(-fc- 



1 



— k r(-k 
r( — i r(-i 



(16 



not th t y onstru tion r(— i— =1 11th ro iliti s r im li itly 

on ition 1 on th s t of r 1 tions — i — k~ ing onsi r n th s of 

im g m t hing th mo Is F, Fa, H, Ha shoul suffi to om 1 t ly s ri 

most situ tion h m rgin 1 ro ility r( — k is o t in y int gr ting 

out fe, 



r( 



k = 



r( 



r( 



(17 



= ( 



lik lihoo 



nor k 



(18 






290 Philip H.S. Torr 



h xt nt to whi h th 
terior odds, 



ij — 



t su 

r(-»- 

r(-i- 



orts — i ov r —j is m 

r( r(-i 



sur 



y th pos- 



(19 



■p^T — is 11 y s tor, th t rm origin t y oo , th m tho 

ttri ut y 00 to uring n r ys 25 t is simil r to th lik lihoo r tio 
for mo 1 om rison ut involv s int gr tion of th ro ility istri utions 

r th r th n om ring th ir m xim h first t rm on th right h n si of 

(19 is th r tio of two int gr Is giv n in (18 h s on f tor is th rior 

o s, whi hishr sttol,r rs nting th s n of ny rior r f r n 

tw n th two r 1 tions, i r(~i = hus (19 n r writt n 



ost rior o s = y s tor — rior o s 



(20 



6.1 Calculating Bayes Factors 



nor r to om ut th y sf tor, th rior istri utions r( ^ — k of h 
mo 1 must s ifi his is oth goo n , goo s it Hows th 
in or or tion of rior inform tion (su h s th stim t of th r 1 tion from 
r vious fr m s , us th s rior 

th r is no su h inform tion 

h si st ro h is to us th BIC 
r( k is roxim t ly norm 1 with m n 
riv tion is giv n in n ix 11 h BIC 
is 

BICfc = 



nsiti s r hr to o t in wh n 

roxim tion whi h ssum s th t 
fc n ov ri n m trix H; th 
roxim tion for th /cth mo 1 



I 

k + pog 



(21 



th mo 1 with low st BIC ing most lik ly ssuming th t th rior on th 
mo Is r(— fc is uniform th ro ility of h mo 1 m y 1 ul t s 



r(- 



X (BICi 



Etf X (BICi 



(22 



iv n two vi ws, sim 1 r mo Is will f vour ov r mor om 1 x on s s th 
I log t rm will omin t , ut s th num r of im g s in r s s th lik lihoo 
fun tion log( — k will t k r n 

Test results using BIC. h BIC is ttr tiv us of its sim 1 form 
whi h m k s it sy to om ut nfortun t ly th BIC roxim tion on- 
sist ntly un rfits th mo 1 f vouring mo Is of too low im nsion his is 

u to th oor roxim tion to qu tion (29 onsi r 100 m t h s th n 

= 400, th num r of o s rv tion, n th num r of r m t rs to sti- 

m t is p = 307 for th fun m nt 1 m trix n 208 for homogr hy hi h 

1 s to BIC n Ity t rm of 307 In 400 for F n 2081n400 forH i r n 
of 593 15, g n r lly homogr hy woul h v to n x tion lly oor fit 
tw n two vi ws for its BIC woul th t mu h 1 rg r th n for fun- 
m nt 1 m trix his woul only o ur with v ry 1 rg s lin n v ry 




Model Selection for Two View Geometry: A Review 291 



1 rg rs tiv ts h ro 1 m om s y th roxim t ion of 5— y 
— I logn, s n ix 11 whi h is g n r lly roxim tion if < 5 p 27 

ut Iso roxim tion for th ssi n of th nuis n r m t rs 

Other Ways to Approximate Bayes Factors h rux of th ro 1 m 
with y s f tors is th hoi of rior i{ k — k s this m y influ n th 

r suit h BIG roxim tion fin ss s this t il y ssuming th rior is v ry 

i us , wh r s i lly th y s f tors shoul v lu t ov r r ng of 
riors to h k th ir st ility itkin 1 sugg sts using th ost rior PDF of 

to om ut wh t h 11s ost rior y s f tors ow v r s k ik oints 

out “th r t us of on n th s m s m 1 in th ost rior m n of 
th lik lihoo fun tion for th flnition n v lu tion of ost rior nsity 

rt inly intro u s rti ul r ty of i s th t inv li t s th us of th m n 

s th lik lihoo of th mo 1” noth r v ri tion is to ivi th t into two, 
using on rt to stim t rior n noth r rform th mo 1 s 1 tion n 

t ont min t with outli rs this ro ur is fr ught with ril, g 11 th 

outli rs li in on of th s ts n n us uniform or i us riors ut gr t r 
must t k n wh n oing this or Lin 1 y’s r ox m y o ur 10 in whi h on 

mo 1 is r itr rily f vour ov r noth r h ro 1 m is th t fl t riors r 

s ifi only u to n un fin multi li tiv oust nt 



6.2 Modified BIG for Least Squares Problems 

t is r nt th t th n Ity t rm for th num r of nuis n r m t rs u 
to th im nsion of th r 1 tion, n th num r of r m t rs u to gr of 

th r 1 tion shoul w ight i r ntly n th n ix it is x 1 in how 

th BIG is o t in y roxim ting th t rmin nt of th log ssi n — 

th inform tion m trix or inv rs ov ri n m trix — y |logn houl 
th stim tion of th r m t rs influ n y 11 th t th n this is 
r son 1 first or r roxim tion ow v r for th 1 st squ r s ro 1 m 

this is not th s h stim tion of ho tim 1 m t h m^ (th stim tion 

of th nuis n or int rn 1 r m t rs r m t rs with gr s of fr om, 
wh r is th im nsion of th two vi w r 1 tion is only t y th 4 
noisy oor in t s of th m t h m un r th ssum tion th t th m t h s r 

in n nt his is h r t riz y lo k i gon 1 ov ri n m trix mong 

m t h s his sugg sts th g om tri y si n inform tion rit rion GBIG 

GBIG = — 2 + log (4 n + log( k (23 

might ro ri t , with = 4 n (r 11 n is th num r of m t h s his 

giv s V ry simil r rform n to GRIG 

7 The Quest for the Universal Prior: MDL 

ithin this s tion th minimum s ri tion 1 ngth rin i 1 is outlin , s 

u on th i of rsimony h t th mo 1 th t r quir si st o ing is st. 




292 Philip H.S. Torr 



this rin i 1 h s n on of th m in rul s of s i n sin its in tion ” 
r to mit no mor us s of n tur 1 things th n su h s r oth tru n 
snffi i nt to X 1 in th ir r n s o this ur os th hiloso h rs s y th t 

N tur o s nothing in v in, n mor is in v in wh n 1 ss will s rv ; for N tur 
is 1 s with sim li ity; n ts not th om of su rfluous us s ” 

(N wton Principia 1726, vol , 398 h k y intuitiv i is th t th sim 1 st 

s ri tion of th ro ss will sym toti lly ( s th num r of o s rv tions 

of th t ro ss om s inlinit fun tion lly quiv 1 nt to th tru on 

iss n n 42 (1978 v lo rit rion with simil r form to th BIC 

from tot lly i r nt st n oint riv th minimum- it r r s nt tion 

of th t , t rm SSD — short st s ri tion 1 ngth, n MDL — minimum 

s ri tion 1 ngth — n ro h sugg st y th i of Igorithmi om 1 xity 

( olomono 48 n Kolmogorov 29 11 n oulton 58 v lo 

V ry simil ri to MDL 11 th minimum m ss g 1 ngth (MML) ro h 
h rit ri MDL n MML r s on minimum o 1 ngths — giv n th 

t r r s nt u to finit r ision, on i ks th r m t rs so th t th 

mo 1 th y fin rmits th short st ossi 1 o 1 ngth hoi ngth 

ing th sum of th o 1 ngth for th mo 1, n th o 1 ngth for th t 

giv n th mo 1 i th rror (th two r ir tly n logons to th log rior 

n log lik lihoo iv n th t it t k s roxim t ly log 2 its to n o 
num r th n it om si r th t th most fr qu ntly o urring o s rv tions 
shoul giv n sm 11 st o 1 ngths, h n MDL m tho s r int gr lly link 
with y si n m tho s y si n tt m ts t mo 1 s 1 tion n stymi 

y th n for rior istri utions, whi h r th y si n’s hiloso h r’s ston ; 

MDL rs to romis su h univ rs 1 rior for th om 1 xity/ rsimony 

of th mo 1 ow V r th t rm riv 

MDL = -2 - I log (24 

h s th s m form n li i n i s s th BIC t this is only first or r 
roxim tion to th o tim 1 o 1 ngth 11 r m n 59 (1987 furth r 

V lo / X n MML 1 ing to v ry simil r rit rion to oz og n’s CAICF 
Test results using MDL: n r lly MDL rit ri r th s m s y si n 

n ro u simil r r suits 

8 Bayesian Model Selection and Model Averaging 

rit rion not onsi r for mo 1 s 1 tion hr is n u tiv r soning t s 

k t 1 st to th r k hiloso h r i urus (3427-270? , who ro os 

th following ro h “Principle of Multiple Explanations f mor th n 
on th ory is onsist nt with th o s rv tions, k 11 th ori s” 32 hus s 
oint out y his follow r Lu r tius (95-55 ifthr r svrl xln- 

tions s to why m n i , ut on os not finitiv ly know whi h on is tru , 
th n Lu r tius ( n 1 t r olomono 32 vo t s m int ining th m 11 for 
the purpose of predietion (som wh t th o osit of kh m’s oint of vi w 




Model Selection for Two View Geometry: A Review 293 



ofr hv sri m tho s in whi h 11 th ot nti 1 mo Is r fitt 
n mo 1 s 1 tion ro ur is us to s 1 t whi h is st n it is o t 
r th r th n from oth r f nsi 1 mo Is his is som wh t r itr ry (“ qui t 

s n 1” 8 , first mitting th t th r is mo 1 un rt inty y s r hing for 

“ st” mo 1 n th n ignoring this un rt inty y m king inf r n s n 
for sts s if it w r rt in th t his hos n mo 1 is tu lly orr t 1 tion 

of just on mo 1 m ns th t th un rt inty on som r m t rs is un r sti- 

m t 21, 36 s will now monstr t jorth istinguish s tw n global 

r m t rs whi hr fin for 11 mo Is n local r m t rs whi h r not 
onsi r figur 2 i ting two im nsion 1 sli of r m t r s for two 
glo 1 rmtrs sth2 rmtrsr Itr (fixing for mom nt 11 

oth r r m t rs th mo 1 s 1 tion rit rion ( g AIC will 1 to i r nt 

mo Is ing sit , giv n fix t his will not (signifi ntly t th 

un rt inty of r m t r stim t t oint in figur 2 s it is w y from 
oun ry; ut oint will h v its un rt inty in orr tly stim t uni ss 

th f t th t multi 1 mo Is r ing onsi r is t k n into ount, s th 

mo 1 r m t riz tion will h ng ov r th oun ry tw n mo 1 1 n 

mo 1 2 urth rmor s th r is no h n for mo 1 1 to t k v lu s of th 

r m t rs within th r gions th t AIC llo t s to mo 1 2 or 3 for this t 

s t; even if that is the correct answer 

jorth 21 giv s r th r n gl t i s th or m whi h is intuitiv ly o vious 
th t th X t V lu of th MAIC will 1 ss th n th minimum of th 
X t V lu of th AIC for 11 th mo Is un r onsi r tion hus if mo 1 

z w r th tru mo 1 th ov r 11 x t tion of th MAIC woul 1 ss th n 
AlCi (so sion lly th wrong mo 1 is s 1 t us is h s low r AIC 

hi h in turn m ns th t th r si u Is n ov ri n r slightly low r th n 

woul X t 




Fig. 2. Two cases, where the uncertainty in the model is unimportant, 
where the uneertainty in the model becomes important 



y si n mo 1 v r ging t k s into ount x li itly mo 1 un rt inty y 
r r s nting th t — hstofmths y om in tion of mo Is 11, 22, 





294 



Philip H.S. Torr 



27, 30, 37 th r th n using on motion mo 1 st n r y si n form lism 
is o t whi h V r g s th ost rior istri utions of th r i tion un r 

h mo 1, w ight y th ir ost rior mo 1 ro iliti s 

n of th m in v nt g s of mo 1 v r ging om s in r it ion if n w 
m t h is o s rv th n its lik lihoo is om ut s w ight v r g ov r 11 

th mo Is n it h s n shown th t mo 1 v r ging n ro u su rior 

r suits in r i tiv rform n th n ommitm nt to singl mo 1 34 u- 

os th t th r r om ting motion mo Is with r 1 tions — i — k with 

r m t r V tors i k, th t oul s ri j h n y si n inf r n 
out mi is s its ost rior istri ution, whi h is 

i=K 

r(mi- = ( r(mi^k r(-k- (25 

i=l 

y th 1 w of tot 1 ro ility 30 hus th full ost rior ro ility of mi 
is w ight V r g of its ost rior istri ution un r h of th mo Is, 
wh r th w ights r th ost rior mo 1 ro iliti s, r(— fe— riv in 

(22 qu tion (25 rovi s inf r n out mi th t t k s into full ount th 

un rt inty tw n mo Is, n is rti ul rly us ful wh n two of th mo Is 

s ri th t qu lly w 11 

Test results using model averaging, s th r is no mo 1 s 1 tion ro- 
ur slightly i r nt t st rit rion h to us hr suits for mo 1 

V r gingw r ss ss y x mining how ointsw r orr tly 1 ssifi s inly- 

ing or outlying, using th om in tion of mo Is for th 1 ssifi tion ( Is wh r 
mo 1 V r ging is ss ss for s gm nt tion 55 v r 11 th r suits w r is- 

ointing with th r ing v ry littl i r n tw n th mo 1 v r ging 

ro h n using th mo 1 sugg st y mo 1 s 1 tion ro ur us to 
g n r t th ost rior istri ution ov r th mo Is (AIC n BIC w r tri 

9 Results 

11 of th mo 1 s 1 tion ro ur s h v n im 1 m nt n om r on 

1 rg t st of synth ti n r 1 im g irs 

Synthetic Data, hr synth ti t s s of on hun r s ts of 100 synth ti 
m t h s, with 10 — 30% outli rs , r g n r t to onsist nt with ith r 
r n om F, Fa, H or Ha h of th mo 1 s 1 tion Igorithms w s run on 
h s t n th mo 1 hos n om r with th known groun truth 

Real Data. h Igorithms h v n t st on m ny im g s, h r two ty i 1 
im g irs r shown n 11 th x m 1 s t st for this r, th orn rs 

r o t in y using th t tor s ri in 18 , th m t hing ro ur 
us s ross orr 1 tion in squ r s r h win ow h st n r vi tion of th 
rror of th oint orr s on n s w s stim t ro ustly 57 or mo 1 
om rison t sting is som wh t si r th n in th g n r 1 s of stim tion, s 




Model Selection for Two View Geometry: A Review 295 




Fig. 3. Left images, Indoor sequenee, eamera translating and rotating to fixate on 
the house; Right Images, two views of a buggy rotating on a table. With disparity 
vectors for features superimposed. 



11 th t n s to known of th groun truth r som qu lit tiv s ts in 

or r to t rmin whi h mo 1 is th tru on ; i if th motion is g n r 1 n 

th r r is rni 1 rs tiv ts th n F is th tru mo 1, if th m r 
rot t s out its o ti ntr th n H is th tru mo 1, if th s n is ist nt 

so s to nr orthogr hi th n Fa is th tru mo 1, n if 11 th oints 

li on ist nt 1 n th n Ha is th tru mo 1, Iso if th fo 11 ngth is long 
n th m r rot t s Ha is ro ri t wo ty i 1 im g irs from th 

t s r th uggy n o 1 hous 

Buggy. h 1 ft two im g s igur s 3 two vi ws of uggy rot ting on turn 
t 1 , goo orthogr hi ut not rs tiv stru tur n g n r t for this 

s n n th orr t (or groun truth mo 1 shoul Fa 

Model house data, h right two im g s igur 3 show s n in whi h m r 

rot t s n tr nsl t s whilst fix ting on mo 1 hous h s n is g n r 1 

motion, this is us th tr nsl tion 1 n rot tion 1 om on nts of th m r 

motion w r oth signifi nt, n full rs tiv stru tur n r ov r 

Summary of Results 

1 Lik lihoo r tio t st, only r lly ro ri t for om rison of two n st 
mo Is 



296 Philip H.S. Torr 



2 AIC t n s to orr tly r v 1 th im nsion ut ov rfits th gr of th 
r 1 tion 

3 BIC n MDL t n to gr tly un rfit th im nsion n gr 

4 h g om tri ro ust inform tion rit rion GRIG (15 with i = 2 0, 2 = 

40 n 3 = 40 rou th most onsist ntly goo r suits, so f r 

5 o 1 V r ging ro u littl im rov m nt tog th r with in r sing th 

mount of om ut tion n ss ry to stim t lik lihoo s 

suits of using th ro ust mo 1 s 1 tor GRIG on th r 1 im g s r giv n 
in 1 8 



Estimated n 


Motion of Points 

General Orthographic Homography Affinity 


Model House 80 
Buggy 167 


596 618 652 755 

1221 1190 1240 1450 



Table 8. GRIG values for the images. The model with lowest GRIG is under- 
lined. 



10 Discussion 

noth r o ul r 1 ss of m tho s for mo 1 s 1 tion in lu ross v li tion, 
j kknifing, ootstr , n t s litting 12, 21 in whi h th t is ivi 

into two rts on is us to lit th r 1 tion n th oth r is us to v lu t th 

goo n ss of lit ow v r su h ro ur s r v ry om ut t ion lly int nsiv n 

highly s nsitiv to outli rs, n r suits on r 1 im g s r oor nth sym toti 

s ton 49 monstr t s th t AIC n ross v li tion r quiv 1 nt h 

r suits r som wh t tt r in th outli r fr s , ut r mov 1 of th outli rs 

r su os s knowl g of th orr t r 1 tion for th t 

t h s n o s rv th t th onstru tion of rior istri ution for h 

r 1 tion is th rux of t rmining th y s f tors, n y t w r not without 

rior inform tion out th istri utions of su h things s F t is ossi 1 to 
onstru t this rior y intro u ing our su j tiv rior knowl g (lik tru 
y si n of m r li r tion n th r ng of ossi 1 m r motions; g 

it might known th t th rin i 1 oint is roughly in th ntr of th im g , 

th t th s t r tio is roughly unity n th fo 11 ngth li s within rt in 
r ng it might Iso known thtth mrws hnhl mrs moving 

roughly w Iking y ssigning ussi ns ro ility istri utions to th s 

with m ningful r m t rs mont rlo s m ling m tho s 15 n us to 

g n r t th rior for h— , nhn th ysf tor t is th n ossi 1 to 

us (29 ir tly to 1 ul t th ysf tor K ss n ft ry 27 r vi w 

1 rg num r of su h ont rlo styl t hniqu s for stim ting ysf tors 

noth r int r sting qu stion is wh th r it is n ss ry to 1 ul t 11 th 

mo Is rior to mo Is 1 tion? 11th ro h s vo t involv stim tion 





Model Selection for Two View Geometry: A Review 297 



of 11 ut tiv mo Is will h is ostly in om ut tion tim h v x rim nt 
with th us of ov ri n m tri s (non-ro ust n ro ust 43 in th ho th t 
y fitting th most g n r 1 mo 1th ov ri n m trix might r v 1 if th r r 

g n r is his works r son ly w 11 if th r r f w outli rs in th t ut 

oorly oth rwis h It rn tiv ro h is to lit th 1 st g n r 1 mo 1 n 

ss ss its goo n ss of fit, only rogr ssing to mor g n r 1 mo 1 if th fit is 

ow V r g in this is ro 1 m ti in th s wh r th r r outli rs in 
th t it is h r to istinguish s t of outli rs to low or r mo 1 from 
s t of inli rs to th high or r mo 1 without fitting oth mo Is o quot from 
illi m 1 k “ h ro of x ss 1 s to th 1 of wis om; ou n v r 

know wh t is nough uni ss you know wh t is mor th n nough” {Proverbs of 

Hell 

Now w rning ft r mo 1 s 1 tion stim t s of mo 1 r m t rs n of 

th r si u Iv ri n s r lik lyto is nr lly mo Is 1 tion i s s r 

hr to qu ntify, ut r h r t riz y fl tion of ov ri n stim t s h 

ty of i s will V ry mu h n on th Igorithm us to rform th mo 1 

s 1 tion ( s w 11 s th rit rion of mo 1 s 1 tion o quot rl 39 “ t 

woul , th r for , mor ro ri t to onn t r i ility with th n tur of 

th s 1 tion ro ur r th r th n th ro rti s of th fin 1 ro u t h n th 

form r is not x li itly known sim li ity m r ly s rv s s rough in i tor for 

th ty of ro ssing th t took 1 rior to th is ov ry” h qu ntifi tion 

of th s i s s is n on going to i for r s r h 

11 Conclusion 

n r lly th r r two ro 1 m r s in om ut r vision; th first is of fin - 

ing th orr t r r s nt tion of th t , n th s on is m ni ul ting th t 

r r s nt tion to m k isions n form hy oth s s out th worl o 1 

s 1 tion li s within oth r s n it is riti 1 to th sign of om ut r vision 

Igorithms, n y t oft n n gl t his n 1 to i s in stim tion or 
inst n th r r 1 rg num r of rs out th un rt inty of th fun - 
m nt 1 m trix ( g 9, 54, 60 ut th t ignor th f t th t th r is un rt inty 

in th hoi of mo 1 its If 

V r 1 m tho s for mo 1 s 1 tion in th 1 st squ r s r gr ssion ro 1 m 
hv nrviw niths n shown th t ( r must t k n to ount 
th gr s of fr om in th mo 1 ( th t r ful istin tion must m 
tw n int rn 1 (nuis n n xt rn 1 r m t rs 

in lly, it n s nthtthr rsvrli rnt mo 1 s 1 tion m tho - 
ologi s is uss in this r t is r nt th t Ithough th mo 1 s 1 tion 
r igms ros in i r nt r s th y h v gr t simil riti s s if th y r 
sh ows of som gr t r th ory to om , th is ov ry of this th ory is lin 

for futur thought nwhil only y in or or ting n un rst n ing mo 1 
s 1 tion Igorithms (whi h r m r ly m h nisms for m king inf r n into 
om ut r vision Igorithms n rogr ss m to fully utom t syst ms 




298 Philip H.S. Torr 



Acknowledgments I gratefully acknowledge W. Triggs, A. Fitzgibbon, D. Murray, 
and A. Zisserman for conversations that contributed to this paper. 

Appendix: The BIC Approximation 

h si st ro h to 1 ul ting y s tors is to us th BIC roxim - 

tion in whi h it is ssum th t r( k — fc is roxim t ly norm 1 with m n 
k n ov ri n m trix h m n n ov ri n n stim t s 

follows, 1 t 

log( r( = 4>{ k =log( r( ^k k r( k^k (26 

th n fc is th stim t of fe, su h th t (/>( fe is minimiz t , with th 

ssi n qu Ito n ov ri n stim t s inv rs ssi n, rforming 

ylor X nsion roun giv s 

r( ^k = ^ X k k 

~ X (f){^ k ^ X ^ k 

= X -</>( fc (2^ P/2- A/2 

h 1 st st in th qu tion ov is st n r r suit for th int gr tion of 

multi-v ri t us si ns hus 

log( r( — k « L + ^ log 27 t + ^ log log( r( k — k (27 

whi h is L 1 ’s roxim tion 6, 33 , from this v rious roxim tions n 
m 1 ing to 1 ul 1 mo 1 s 1 tion rit ri rh s th sim 1 st is 
giv n y hw rz who roxim t s th rior y norm 1 istri ution with 
™ II prior II ov ri n 1 ing to 

r( fe k = (27T p/ — 2 — x( — ( pj.jQj. — k priori prior “ k (28 

thus 

log( r(— fc— Ri ~ ij( fc ~ prior priori prior + IT 

z e z prior 

Now = prior L, wh r ^ is th inv rs of th ssi n of th log-lik lihoo 
t fc, 1 ing to 

log( r(— fc— RsL — — ( k~ prior ^^prior^ ~ prior “ 2 ~ 




Model Selection for Two View Geometry: A Review 299 



f th rior is ssum to v ry i us th n hw rz 46 sugg sts is ounting 
th s on t rm n roxim ting th ssi n y |log— | log n tog t 

log(-fe- «log( -k - |logN = L - ^logN (30 

wh r p = n+k is th tot 1 num r of r m t rs in th syst m, n = 4n is 
th tot 1 num r of o s rv tions, n th num r of m t h s, n th im nsion 
of wi th n g tiv of this, — +plog , is r f rr to s th y si n 
nform tion rit rion or BIG n n us to om r th lik lihoo of 

om ting mo Is 

n r 1 omm nts h BIG iv rg s from th full y si n vi w oint y 

is ounting th rior t rm in (30 h t rmin ntofth log ssi n ^log — 

no s th un rt inty in th r m t r stim t s (th sm 11 r this v lu th 
gr t r th r ision in th stim t f v il 1 th ssi n its If shoul 
1 ul t n us 

References 

[1] M.A. Aitkin. Posterior Bayes Factors. t tist o , 53(1):111-142, 1991. 

[2] H. Akaike. A new look at the statistical model identification. r ns on 

uto ti ontro , Vol. AC-19(6):716-723, 1974. 

[3] H. Akaike. Factor analysis and AIC. s o trik , 52(3):317-332, 1987. 

[4] P. Beardsley, P. H. S. Torr, and A. Zisserman. 3D model aquisition from ex- 
tended image sequences. In B. Buxton and Cipolla R., editors, ro t uro 

p n on r n on o put r ision LN 6 ri , pages 683-695. 

Springer- Verlag, 1996. 

[5] P. Beardsley, P. H. S. Torr, and A. Zisserman. 3D model aquisition from ex- 
tended image sequences. In B. Buxton and Cipolla R., editors, ro t uro 

p n on r n on o put r ision LN 6 ri , pages 683-695. 

Springer- Verlag, 1996. 

[6] C. M. Bishop. N ur N t orks or tt rn o nition. Clarendon Press, Oxford, 
1995. 

[7] H. Bozdogan. Model selection and Akaike’s information criterion (AIC): The 
general theory and its analytical extensions, s o trik , 52(3):345-370, 1987. 

[8] C. Chatfield. Model uncertainty, data mining and statistical inference. 

t tist o , 158:419-466, 1995. 

[9] G. Csurka, C. Zeller, Z. Zhang, and O. Faugeras. Characterizing the uncertainty 

of the fundamental matrix. , 68(l):18-36, 1996. 

[10] M. DeGroot. pti t tisti isions. McGraw-Hill, 1970. 

[11] D. Draper. Assessment and propagation of model uncertainty (with discussion). 

ourn 0 t o t tisti o i t s ri s , 57:45-97, 1995. 

[12] B. Efron and R.J. Tibshirani. n ntro u tion to t ootstr p. Chapman and 
Hall, London, UK, 1993. 

[13] O.D. Faugeras. What can be seen in three dimensions with an uncalibrated stereo 

rig? In G. Sandini, editor, ro 2n urop n on r n on o put r ision 
LN nt r rit Li ur , pages 563-578. Springer-Verlag, 1992. 

[14] R. A. Fisher. Uncertain inference, ro r rts n i n s, 71:245- 

258, 1936. 




300 Philip H.S. Torr 



[15] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. si n t n sis. 
Chapman k. Hall, New York, 1995. 

[16] P. E. Cill, W. Murray, and M. H. Wright. r ti pti i tion. Academic 
Press, 1981. 

[17] J. P. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. o ust 

t tisti s n ppro s on nflu n un tions. Wiley, New York, 1986. 

[18] C. Harris. Structure-from-motion under orthographic projection. In O. Faugeras, 
editor, ro st urop n on r n on o put r ision LN 21, pages 
118-128. Springer- Verlag, 1990. 

[19] R. I. Hartley. Estimation of relative camera positions for uncalibrated cameras. In 
C. Sandini, editor, ro 2n urop n on r n on o put r ision LN 

nt r rit Li ur , pages 579-87. Springer-Verlag, 1992. 

[20] R. I. Hartley and P. Sturm. Triangulation. In n n n rst n in 

orks op, pages 957-966, 1994. 

[21] U. Hjorth. On model selection in the computer age. t tist nn n , 23:101- 
115, 1989. 

[22] J. S. Hodges. Uncertainty, policy analysis and statistics (with discussion), t tis 
ti in, 2:259-291, 1987. 

[23] P. J. Huber, o ust t tisti s. John Wiley and Sons, 1981. 

[24] M. Irani, P. Anandan, and D. Weinshall. From reference frames to reference 
planes: Multi-view parallax geometry and its applications. In H. Burkhardt and 
B. Neumann, editors, ro t urop n on r n on o put r ision LN 

6 r i ur , pages 829-846. Springer-Verlag, 1998. 

[25] H. Jeffreys. or o ro i it . Clarendon Press, Oxford, third edition, 1961. 

[26] K. Kanatani. t tisti pti i tion or o tri o put tion or n 

r ti . Elsevier Science, Amsterdam, 1996. 

[27] R. E. Kass and A. E. Raftery. Bayes factors, ourn o t ri n t tisti 

sso i tion, 90:733-795, 1995. 

[28] M. Kendall and A. Stuart. n or o t tisti s. Charles Griffin 

and Company, London, 1983. 

[29] A.N. Kolmogorov. Three approaches to the quantitative definition of information. 

ro so n or tion r ns ission, 1:4-7, 1965. 

[30] E. E. Learner, p i tion s r s o in r n it non p ri nt t . 
Wiley, New York, 1978. 

[31] I. J. Leontaritis and S. A. Billings. Model selection and validation methods for 

non-linear systems. N N L, 45(1):311-341, 1987. 

[32] M. Li and P. Vitanyi. n intro u tion to o o oro o p it n its pp i 
tions. Springer-Verlag, 1997. 

[33] D. V. Lindley. Approximate Bayesian methods. In J. M. Bernardo, M. H. De- 
Groot, D. V. Lindley, and A. F. M. Smith, editors, si n t tisti s, pages 
223-237, Valencia, 1980. Valencia University Press. 

[34] D. Madigan and A. E. Raftery. Model selection and accounting for model un- 
certainty in graphical models using Occam’s window. ourn o t ri n 

t tisti sso i tion, 89:1535-1546, 1994. 

[35] G.I. McLachlan and K. Basford. i tur o s in r n n pp i tions to 

ust rin . Marcel Dekker. New York, 1988. 

[36] A. J. Miller. Selection of subsets of regression variables (with discussion), ourn 
0 t o t tisti o i t ( ri s ), 147:389-425, 1984. 

[37] B. R. Moulton. A Bayesian-approach to regression selection and estimation with 

application to a price-index for radio services, ourn o ono tri s, 49:169- 

193, 1991. 




Model Selection for Two View Geometry: A Review 301 



[38] J. Mundy and A. Zisserman. o tri n ri n in o put r ision. MIT 
press, 1992. 

[39] J. Pearl. On the connection between the complexity and credibility of inferred 
models, nt nr st s, 4:255-264, 1978. 

[40] K. Pearson. On lines and planes of closest fit to systems of points in space. i os 

r 6, 2:559, 1901. 

[41] B. D. Ripley. tt rn r o nition n n ur n t orks. Cambridge University 
Press, Cambridge, 1996. 

[42] J. Rissanen. Modeling by shortest data description. uto ti , 14:465-471, 
1978. 

[43] P. J. Rousseeuw. o ust r ssion n utir t tion. Wiley, New York, 
1987. 

[44] B. Russell, istor o st rn i osop . Routledge, 1961. 

[45] K. Schittowski. NLQPL: A FORTRAN-subroutine solving constrained nonlinear 
programming problems. nn s o p r tions s r , 5:485-500, 1985. 

[46] G. Schwarz. Estimating dimension of a model. nn t t , 6:461-464, 1978. 

[47] G.A.F. Wild C. J. Seber. Non Lin r r ssion. Wiley, New York, 1989. 

[48] R. Solomonoff. A formal theory of inductive inference i. n or tion n ontro , 
7:1-22, 1964. 

[49] M. Stone. An asymptotic equivalence of choice of model by cross-validation and 
Akaike’s criterion. o t tist o , 39:44-47, 1977. 

[50] P. H. S. Torr. utir t tion n otion nt tion. PhD thesis, Dept, of 

Engineering Science, University of Oxford, 1995. 

[51] P. H. S. Torr. Geometric motion segmentation and model selection. In J. Lasenby, 
A. Zisserman, R. Cipolla, and H. Longuet-Higgins, editors, i osop i r ns 

tions o t o o i t , pages 1321-1340. Roy Soc, 1998. 

[52] P. H. S. Torr, A. FitzGibbon, and A. Zisserman. Maintaining multiple motion 
model hypotheses through many views to recover matching and structure. In 
U Desai, editor, 6, pages 485-492. Narosa Publishing House, 1998. 

[53] P. H. S. Torr and D. W. Murray. The development and comparison of robust 
methods for estimating the fundamental matrix, nt oum o o put r ision, 
24(3):271-300, 1997. 

[54] P. H. S. Torr and D. W. Murray. The development and comparison of robust 
methods for estimating the fundamental matrix. , 24(3):271-300, 1997. 

[55] P. H. S. Torr and A. Zisserman. Concerning bayesian motion segmentation, model 

averaging, matching and the trifocal tensor. In H. Burkharddt and B. Neumann, 
editors, 9 o , pages 511-528. Springer, 1998. 

[56] P. H. S. Torr and A. Zisserman. Robust computation and parametrization of mul- 
tiple view relations. In U Desai, editor, 6, pages 727-732. Narosa Publishing 

House, 1998. 

[57] P. H. S. Torr, A Zisserman, and S. Maybank. Robust detection of degenerate 

configurations for the fundamental matrix. , 71(3):312-333, 1998. 

[58] C.S. Wallace and D.M. Boulton. An information measure for classification. o 
put r oum , ll(2):195-209, 1968. 

[59] C.S. Wallace and P.R. Freeman. Estimation and inference by compact coding. 

t tist o , 49(3):240-265, 1987. 

[60] Z. Zhang. Determining the epipolar geometry and its uncertainty: A review. 

, 27(2):161-195, 1997. 




Finding Objects by Gronping Primitives 



vi orsyth ohn on n rg y off 



Computer Science Division, U.C. Berkeley, Berkeley, CA 94720, USA 
daf , iof f e,haddon(§cs .berkeley . edu, 
bttp : //www. cs .berkeley . edu/ ~daf , ioff e .haddon 



Abstract. Digital library applications require very general object recog- 
nition techniques. We describe an object recognition strategy that op- 
erates by grouping together image primitives in increasingly distinctive 
collections. Once a sufficiently large group has been found, we declare 
that an object is present. We demonstrate this method on applications 
such as finding unclothed people in general images and finding horses 
in general images. Finding clothed people is difficult, because the vari- 
ation in colour and texture on the surface of clothing means that it is 
hard to find regions of clothing in the image. We show that our strategy 
can be used to find clothing by marking the distinctive shading patterns 
associated with folds in clothing, and then grouping these patterns. 



1 Background 

v r 1 typi 1 oil tions ont ining ov r t n million im g s r list in 6 . 
h r is n xt nsiv lit r tur on o t ining im g s rom 1 rg oil tions using 

tur s omput rom th whol im g in in ing olour histogr ms t xtur 

m sur s n sh p m sur s; signifi nt p p rs in in 9 13 16 21 24 2 
27 30 31 36 37 38 39 42 . 

ow V r in th most ompr h nsiv fi 1 stu y o us g pr ti s ( p p r 
y ns r 6 surv ying th us o th ulton uts h oil tion) th r is 

1 r us r pr r n or s r hing th s oil tions on im g s m nti s; typi 1 

u ri s o s rv r ov rwh Imingly ori nt t ow r o j t 1 ss s ( inos urs” 
p. 40 himp nz t p rty rly” p. 41) or inst n s ( rry om ” 

p. 44 w r th g sti ul ting” p. 4). ni Is rh tool woul 
uit g n r 1 r ognition syst m th t oul pt ui kly n sily to th 

typ s o o j ts sought y us r. uil ing su h tool r uir s mu h mor 
sophist! t un rst n ing o th pro ss o r ognition th n urr ntly xists. 

j t r ognition will not ompr h nsiv ly solv in th or s 1 u 

tur . olutions th t r goo nough to us ul or som s s in ppli tions 

r lik ly how v r. u rying im g oil tions is p rti ul rly goo ppli 
tion us in m ny s s no oth r u ry m h nism is v il 1 — th r is 

no prosp t o s r hing 11 th photogr phs y h n . urth rmor us rs r 
typi lly h ppy with low r 11 u ri s in t th output o high r 11 s r h 

or h r si nt” o 1 rg n ws oil tion woul unus 1 or most p 

pli tion purpos s. his propos 1 o us s on r s th t orm signifi nt su s t 
o th s u ri s wh r us ul tools n r son ly xp t . 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 302-318, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




Finding Objects by Grouping Primitives 303 



is ussing r ognition r uir s r sp ting istin tion tw n two impor 
t nt n su tly iff r nt pro 1 ms finding wh r th im g ompon nts th t 
r suit rom singl o j t r oil t tog th r; n naming wh r th p rti 
ul r n m o singl isol t o j t is t rmin . in ing is not w 11 fin 

us o j ts r not w 11 fin — or x mpl woul on r g r th im g 

ompon nts orr spon ing ton rorny ssprtojtstht ompris 
or o th s ompon nts long tog th r s p rt o singl in issolu 1 

o j t? 



2 Primitives, Segmentation, and Implicit Representations 

ritings on o j t r ognition h v t n to on ntr t on n ming pro 1 ms. 
or som typ sooj tors n fin ing n voi y nit simpl t h 
ni us. or X mpl or sm 11 num rs o g om tri lly x t o j t mo Is 

s r h is ff tiv 7 14 18 22 26 28 29 34 40 ; n or isol t o j ts 

fin ing is irr 1 v nt. 

ow V r in m ny ppli tions fin ing is n import nt ompon nt o th 
pro lm;otnthnmo noj tisr uir only t v ry limit 1 v 1 o 
t il ( p rson” ig t” t .). hil n ming is not n sy pro 1 m uit 

goo solutions pp r possi 1 with xt nsions o urr nt pos s t hni u s. 

hr r s V r 1 r sons fin ing is v ry ifh ult n poorly un rstoo . in 
ing is ss nti lly s gm nt tion writ 1 rg using g n ri us — lik oh r n in 
olour n t xtur us y urr nt work on s gm nt tion — initi lly n high 
1 V 1 knowl g 1 t r to o t in regions that should be recognised together, ow 
V r i ing whi h its o th im g long tog th r n shoul r ognis 
tog th r r uir s knowl g o o j t prop rti s. s r suit fin ing involv s 

ploying o j t knowl g to ir t n gui s gm nt tion — ut how is th 
right pi o knowl g to us in th right pi ? n wish s to r ogniz 

o j ts t 1 ss 1 V 1 in p n nt o g om tri t il so th t fin ing Igorithms 

shoul p 1 o abstraction, or x mpl most u rup s h v roughly 

th s m o y s gm nts in roughly th s m pi — goo fin ing Igorithms 

woul xploit this t or s y m suring th istri ution o mus ul tur on 

h s gm nt or th num r o h irs on n r. in lly s nsi 1 ppro h to 

fin ing shoul us r pr s nt tions th t r ro ust to th ff ts o pose n o 

internal degrees of freedom su h s joints. 

w us th wor primitiv mor loos ly to m n tur or ss m ly 
o tur s th t h s onstr in stylis pp r n th n r pr s nt tion 
s roun primitiv s t m ny 1 v Is h s th gr t v nt g th t t h 

st g o fin ing progr m n know wh t it is looking or. or x mpl hors s 

n r pr s nt ( ru ly!) s ss m li s o hi olour ylin rs — this 

r suits in fin ing pro ss th t first looks or hi lik r gions; th n fin s g 

points n us s g om tri 1 onstr ints to ss m 1 s ts o g points th t 

oul h V om rom ylin rs; n fin lly r sons out th onfigur tion 

o th ylin rs. t h st g th r r w It rn tiv s to hoos rom whi h 

m ns th s r his fh i nt; n whil h in ivi u 1 1 st is w k th oil tiv 




304 David Forsyth, John Iladdon, and Sergey Ioffe 



o t sts in s u n n uit pow r ul. h hoi o primitiv s n th or r 

n n tur o ss m ly routin s tog th r orm n implicit representation — 

r pr s nt tion o n o j t s fin ing pro ss whi h un tions s sour o 
top own knowl g . 

now h V som insight into wh t shoul primitiv . rimitiv s shoul 

h V stereotyped appearance, h most us ul orm o primitiv is on wh r 
it is possi 1 to t st n ss m ly o im g tur s n s y wh th r it is lik ly to 
h V om rom primitiv or not. or x mpl it is known th t su h t sts r 
sy or sur s o r volution str ight homog n ous g n r lis ylin rs n 1 

sur s n ylin rs 32 33 43 . s r suit it is possi 1 to s gm nt im g 

r gions th t r lik ly to orr spon to su h sur s without knowing to what 
objeet they belong^ . s on tur o us ul primitiv isth t it is significant, 

or X mpl ylin r is signifi nt prop rty us m ny o j ts r t 
ru 1 V 1 m o ylin rs. thir us ul prop rty is robustness; ylin ri 1 

primitiv s r uit sy to fin v n in th pr s n o som orm tions. h s 
prop rti s m n th t fin ing o j ts th t r ss m li s o primitiv s ss nti lly 

involv s fin ing th primitiv s n th n r soning out th ir ss m ly. s w 

h V in i t pr vious work h s typi lly on ntr t on p rsing tiviti s 

(whi h ssum th t fin ing h s Ir y o urr ); this propos 1 on ntr t s on 

fin ing. 

2.1 Body Plans - Interim Results on Implicit Representations 

n tur 1 impli it r pr s nt tion to us or p opl n m ny nim Is is body 

plan — s u n o grouping st g s onstru t to mirror th 1 yout o o y 
s gm nts. h s grouping st g s ss m 1 im g ompon nts th t oul orr 

spon to ppropri t o y s gm nts or oth r ompon nts ( s in figur 1 whi h 

shows th pi n us s n impli it r pr s nt tion o hors ). ving s u n 

o st g s m ns th pro ss is ffi i nt th pro ss n st rt with h king 
in ivi u 1 s gm nts n mov to h king multi s gm nt groups so th t not 
11 groups o our (or how v r m ny or th r 1 v nt o y pi n) s gm nts r 
pr s nt to th fin 1 1 ssifi r. h v on xt nsiv xp rim nts with two 

s p r t syst ms th t us th s m stru tur 

— mgsrmsk orr gions o ppropri t olour n t xtur . 

— oughly ylin ri 1 r gions o ppropri t olour n t xtur r i ntifi . 

— ss m li s o r gions r orm n t st g inst s u n o pr i t s. 

h first X mpl i ntifi s pi tur s ont ining p opl w ring littl or no 

lothing to fin ss th issu o v ri tions o pp r n o lothing. his progr m 

h s n t st on n usu lly 1 rg n unusu lly iv rs s t o im g s; on t st 

oil tion o 6 im g s known to ont in lightly 1 p opl n 4289 ontrol 

^ While current techniques for finding generalised cylinders are fragile, because they 
winnow large collections of edges to find subsets with particular geometric properties 
and so are overwhelmed by images of textured objects, the principle remains. We 
indicate an attack on this difficulty below. 




Finding Objects by Grouping Primitives 305 




Fig. 1. The body plan used for horses. Eaeh circle represents a classifier, with 
an icon indicating the appearance of the assembly. An arrow indicates that the 
classifier at the arrowhead uses segments passed by the classifier at the tail. The 
topology was given in advance. The classifiers were then trained using image data 
from a total of 38 images of horses. 



im g s with wi ly v rying ont nt on tuning o th progr m m rk 241 t st 

im g s n 182 ontrol im g s (th p r orm n o v rious iff r nt tunings is 

in i t in figur 3; mor t il in orm tion pp rs in 12 10). hr 11 is 

omp r 1 with nil t xt o um nt r 11 3 4 3 (whi h is surprisingly goo 
or so str t n o j t r ognition u ry) n th r t o Is positiv s is 

s tis torily low. n this s th r pr s nt tion w s ntir ly uilt y h n . 

h s on X mpl us r pr s nt tion whos om in tori 1 stru tur — 

th or r in whi h t sts w r ppli — w s uilt y h n ut wh r th t sts 

w r 1 rn rom t . his progr mi ntifi pi tur s ont ining hors s n 
is s ri in gr t r t il in 11 . sts us 100 im g s ont ining hors s 
n 1086 ontrol im g s with wi ly v rying ont nt. h g om tri pro ss 
m k s signifi nt iff r nt s figur 2 illustr t s. h p r orm n o v rious 
iff r nt onfigur tions is shown in figur 3. or v rsion ” i on stim t s 
p r orm n omitting im g s us in tr ining n im g s or whi h th s gm nt 

fin ing pro ss ils th r 11 is 1 — i. . out 1 o th im g s ont ining 

hors s r m rk — n ontrol im g s r m rk t th r t o pproxim t ly 






306 David Forsyth, John Iladdon, and Sergey Ioffe 




Fig. 2. Typical images with large quantities of hide-like pixels (white pixels are 
not hide-like; others are hide-like) that are classified as not containing horses, 
because there is no geometric configuration present. While the test of eolour and 
texture is helpful, the geometric test is important, too, as the results in figure 3 
suggest. In particular, the faet that a horse is brown is not nearly as distinetive as 
the fact that it is brown, made of cylinders, and these cylinders have a partieular 
set of possible arrangements. 



0.6 . n our t st oil tion this tr nsl t s to 11 im g s o hors s m rk n 

4 ontrol im g s m rk ^ . 

in ing using o y pi ns h s n shown to uit ff tiv or sp i 1 
s s in uit gnrls ns. tisrl tiv ly ins nsitiv to h ng s in sp til. 
t is uit ro ust to th r 1 tiv ly poor s gm nt tions th t our rit ri off r 
us it is uit ff tiv in ling with nuis n s gm nts — in th hors 
t sts th V r g num r o our s gm nt groups w s 2, 00,000 whi h is n 

V r g o orty s gm nts p r im g . Non th 1 ss th pro ss s ri ov is 
ru it is too p n nt on olour n t xtur rit ri or rly s gm nt tion; 

th 1 rning pro ss is s nt (hum ns) or xtr m ly simpl (hors s); n th r 
is on r ognis r p r 1 ss. 

3 Learning Assembly Processes from Data 

h V n stu ying pro ss s or 1 rning to ss m 1 primitiv s. h r og 

nition pro ss s s ri ov h v strong ompon nt o orr spon n ; in 

^ These figures are not 15 and 7, because of the omission of training images and images 
where the segment finder failed in estimating performance. 





Finding Objects by Grouping Primitives 307 




25 



20 




Fig. 3. The response ratio, (percent incoming test images marked/percent in- 
coming control images marked), plotted against the percentage of test images 
marked, for various configurations of the two finding programs. Data for the 
nude human finder appears on the top, for the horse finder on the right. Capital 
letters indicate the performance of the complete system of skin/hide filter and 
geometrical grouper, and lower case letters indicate the performance of the geo- 
metrical grouper alone. The label “skin” (resp “hide”) indicates the selectivity of 
using skin (resp hide) alone as a criterion. For the human finder, the parameter 
varied is the type of group required to declare a human is present — the trend 
is that more complex groups display higher selectivity and lower recall. For the 
horse finder, the parameter being varied is the maximum number of that will be 
considered. 



p rti ul r w r pruning s t o orr spon ns tw n im g s gm nts n 

o y s gm nt 1 Is y t sting or kin m ti pi usi ility. 

h s r h or pt 1 orr spon ns n m fR i nt y using 

projected classifiers whi h prun 1 lings using th prop rti so sm 11 r su 
1 lings ( s in 18 who us m nu lly t rmin oun s n o not 1 rn th 
t sts). iv n 1 ssifi r C whi h is un tion o s t o tur s whos v lu s 

p n on s gm nts with 1 Is in th s t L {/i • • • Im} th proj t 1 ssifi r 

is un tiono o 11 thos tur s th t p n only on th s gm nts with 
1 Is L {^1 • ■ .Ik}, n p rti ul r ) > 0 i th r is som xt nsion 

L o L su h th t C{L) > 0. his rit rion orr spon s to insisting th t groups 
shoul p ss int rm i t 1 ssifi rs i with appropriate segments attached th y 

p ss fin 1 1 ssifi r. 

h onv rs n not tru th tur v lu s r uir to ring proj t 
point insi th positiv volum o C m y not r liz with ny 1 ling o th 

urr nt s t o s gm nts 1, . . . , or proj t 1 ssifi r to us ul it must 
sy to omput th proj tion n it must ff tiv in r j ting 1 lings 

t n rly st g . h s r strong r uir m nts whi h r not s tisfi y most 
goo 1 ssifi rs; or x mpl in our xp ri n support v tor m hin with 

positiv finit u r ti k rn 1 proj ts sily ut typi lly yi 1 s unr stri tiv 

proj t 1 ssifi rs. 





308 David Forsyth, John Iladdon, and Sergey Ioffe 



h V n using n xis lign oun ing ox with oun si rn rom 

oil tion o positiv 1 llings or goo first s p r tion n th n using 

oost V rsion o w k 1 ssifi r th t splits th tur sp on singl 
tur V lu ( s in 1 ). his yi 1 s 1 ssifi r th t proj ts p rti ul rly w 11 
n Hows Inn fh i nt Igorithms or oniputing proj t 1 ssifi rs n 
xp n ing s ts o 1 Is (s 23 ). 

h s gm nt fin r m y fin ith r 1 or 2 s gm nts or h lim p n ing 
on wh th r it is nt or str ight; us th pruning is so ff tiv w n 

How s gm nts to rok n into two u 1 h Iv s 1 ngthwis oth o whi h r 

t st . 

3.1 Results 

h tr ining s t in lu 79 im g s without p opl sit r n omly rom 
th Corel t s n 274 im g s h with singl p rson on uni orm 
kgroun . h im g s with p opl h v n s nn rom ooks o hum n 
mo Is 41 . 11 s gm nts in th t st im g s w r r port ; in th ontrol im g s 

only s gm nts whos int rior orr spon to hum n skin in olour n t xtur 

w r r port . ontrol im g s oth or th tr ining n or th t st s t w r 

hos n so th t 11 h t 1 st 30 o th ir pix Is simil r to hum n skin in 

olour n t xtur . his giv s mor r Hsti t st o th syst m p r orm n 
y X lu ing r gions th t r o viously not hum n nr u s th num r o 
s gm nts in th ontrol im g s to th s m or rom gnitu s thos in th 
t st im g s. 

h mo Is r 11 w ring ith r swim suits or no loth s oth rwis s gm nt 
fin ing ils; it is n op n pro 1 m to s gm nt p opl w ring loos lothing. 

h r is wi v ri tion in th pos s o th tr ining x mpl s Ithough Hoy 

s gm nts r visi 1 . h s ts o s gm nts orr spon ing to p opl w r th n 

h n 1 1 . th 274 im g s with p opl s gm nts or h o y p rt w r 

oun in 193 im g s. h r m ining 81 r suit in in ompl t onfigur tions 

whi h oul still us or omputing th oun ing ox us to o t in first 

s p r tion. in w ssum th t i onfigur tion looks lik p rson th n its 
mirror im g woul too w ou 1 th num r o o y onfigur tions y flipping 
h on out V rti 1 xis. h oun ing ox is th n omput rom th 
r suiting 48 points in th tur sp without looking t th im g s without 
p opl . 

h oost 1 ssifi r w s tr in to s p r t two 1 ss s th 193 x 2 386 

points orr spon ing to o y onfigur tions n 60727 points th t i not or 

r spon to p opl ut 1 y in th oun ing ox o t in y using th oun ing 

ox 1 ssifi r to in r m nt Hy uil 1 lings or th im g s with no p opl . 

1178 synth ti positiv onfigur tions o t in y r n omly s 1 ting 
h lim n th torso rom on o th 386 r 1 im g s o o y onfigur tions 

(whi h w r rot t n s 1 so th torso positions w r th s m in Ho 

th m) to giv n ff to joining lim s n torsos rom iff r nt im g s r th r 

lik hil r ns’ Hip ooks. m rk ly th oost 1 ssifi r 1 ssifi h o th 




Finding Objects by Grouping Primitives 309 



Features 


# test images 


# control images 


False negatives 


False positives 


367 


120 


28 


37% 


4% 


567 


120 


86 


49% 


10 % 



Table 1. Number of images of people and without people processed by the clas- 
sifiers with 367 and 567 features, compared with false negative (images with a 
person where no body configuration was found) and false positive (images with 
no people where a person was detected) rates. 



r 1 t points orr tly ut mis 1 ssifi 976 out o th 1178 synth ti onfig 

ur tions s n g tiv ; th synth ti x mpl s w r un xp t ly mor simil r to 

th n g tiv X mpl s th n th r lx mpl s w r . 

h tst tstwssprt rom th tr ining s t n in In 120 im 
g s with p rson on uni orm kgroun n v rying num rs o ontrol 

im g s r port in t 11. r port r suits or two 1 ssifi rs on using 67 

tur s n th oth r using su s t o 367 o thos tur s. 1 1 shows th 
Is positiv n Is n g tiv r t s hi v or h o th two 1 ssifi rs. y 

m rking 1 o t st im g s n only 10 o ontrol im g s th 1 ssifi r using 

67 tur s omp r s xtr m ly vour ly with th t o 8 whi h m rk 4 

o t st im g s n 38 o ontrol im g s using h n tun t sts to orm groups 

0 our s gm nts. n o th 9 im g s wh r th r w s Is n g tiv 

s gm nt orr spon ing to o y p rt w s miss y th s gm nt fin r m n 
ing th t th ov r 11 syst m p r orm n signifi ntly un rst t s th 1 ssifi r 

p r orm n . h r r w signs o ov rfitting pro ly us th tur s 

r highly r un nt. sing th 1 rg r s t o tur s m k s 1 lling st r ( y 

tor o out fiv ) us mor onfigur tions r r j t rli r. 

4 Shading Primitives, Shape Representations, and 
Clothing 

in ing loth p opl is r mor su tl pro 1 m th n fin ing n k p opl 
us th V ri tion in olour t xtur n p tt rn o lothing ts olour 
s gm nt tion str t gy. lothing o s h v istin tiv prop rti s th p tt rns 

orm y ol s on lothing pp r to off r us to th onfigur tion o th 

p rson un rn th ( s ny t xt ook on figur r wing will illustr t ). h s 

01 s h V uit istin tiv sh ing p tt rns 19 whi h r omin nt tur 

o th sh ing fi 1 o p rson 1 in loos g rm nt us It hough th y 

r g om tri lly sm 11 th sur norm 1 h ng s signifi ntly t ol . ol s 

r st n lys using th th ory o u kling n ris rom v ri ty o us s 

in lu ing x ss m t ri 1 s in th so ull skirt n str ss s on g rm nt 
us y o y onfigur tions. ol s pp r to th singl most istin tiv 

r li 1 n g n r 1 visu 1 u to th onfigur tion o p rson r ss in otton 

g rm nt. 





310 David Forsyth, John Iladdon, and Sergey Ioffe 



4.1 Grouping Folds Using a Simple Bnckling Model 

rm nts n mo 11 si sti sh 11s Rowing r th r simpl pr i tions 

0 th p tt rn o ol s using th on K rm n onn 11 u tion or lin ris 

V rsion o th t u tion. his is known to u ions sour o pr i tions 

o u kling or ut th r u n i s o th ig n un tions — whi h giv th 
u kl solutions — r pt s ir pr i tions o th u kling mo or 
th s s s ri (this is th topi o hug lit r tur intro u in ). 

h ig n un tions How us to pr i t th t g rm nts u kling in ompr ssion or 

torsion will ispl y long n rly str ight ol s th t r n rly p r 11 1 n n rly 

V nly sp . h s ol s will pproxim t ly p rp n i ul r to th ir tion 
o ompr ssion n will in i t th ir tion o th torsion, h num r o 

01 s p n s on t nsion in th g rm nt n is h r to pr i t.^ or torsion 

r son 1 stim t s o g rm nt’s siz yi 1 on th or r o fiv visi 1 ol s. 

s figur s 4 n in i t th s pr i tions r ur t nough to riv 
s gm nt tion pro ss. 

pply th simpl ol fin r s ri in 20 to th im g t tw Iv 

iff r nt ori nt tions. sing th s tw Iv r spons m ps w us non m ximum 

supr ssion to fin th ntr o th ol n ollow this m ximum long th 

ir tion o m ximum r spons to link 11 points orr spon ing to singl ol . 
h linking pro ss r ks sh rp orn rs y onsi ring th prim ry ir tion 
o th pr ing points long th ol . 

t r fin ing 11 o th ol s in th im g th n xt st p is to fin p irs whi h 

r pproxim t ly p r 11 1 n in th s m p rt o th im g . th proj tions 

0 th two ol s onto th ir v r g ir tion r isjoint th y r onsi r to 

long to iff r nt p rts o th im g . 

rom th th ory w xp t th t multipl ol s will t r gul rly sp 
int rv Is. hus w look or p irs whi h h v on ommon ol n onsist nt 

s p r tions. ( h s p r tions shoul ith r th s m or on shoul ou 1 

th oth r — i singl ol g ts ropp w o not w nt to ignor th ntir 

p tt rn.) h s p r tion tw n ol s is r uir to 1 ss th n th m ximum 

1 ngth o th ol s. in lly som o th s groups n urth r om in i th 

groups h V Imost th s m s t o ol s. 

h progr m typi lly xtr ts 10 2 groups o ol s rom n im g . igur 4 

shows on im g with thr typi 1 groups, h group in 4( ) 1 rly orr spon s 

to th m jor ol s ross th torso in th im g . his is in t s gm nt tion 

o th im g into oh r nt r gions onsisting o possi 1 pi so loth, h 

r gion ov r y th ol s in ( ) is most o th torso o th figur n sugg sts 
lik ly n i t or onsi r tion s torso, hr r oth r groups s w 11 
su h s ( ) th V n ti n lin s n ( ) n li s v rsion o ( ) ut th s 

xtr s gm nts r sily It with y high r 1 v 1 pro ss s. 

nyimgomnm s ns will h v num r o str ight p r 11 1 lin s 

whi h m y h V simil r sh ing to ol s (s or x mpl figur 6). hil this 

® This can be demonstrated with a simple experiment. Wearing a loose but tucked-in 
T-shirt, bend forward at the waist; the shirt hangs in a single fold. Now pull the 
T-shirt taut against your abdomen and bend forward; many narrow folds form. 




Finding Objects by Grouping Primitives 311 





(c) (d) 



Fig. 4. Results of a segmenter that obtains regions by grouping folds that satisfy 
the qualitative predictions of the linear buckling theory, (a) An image showing 
folds corresponding to torsional buckling. (b,c,d) Three groups of folds found by 
our program. The group in (b) is, in essence the torso; it contains the major folds 
across the torso, and can be used to represent the torso. An edge detector could 
not extract the outline points of the torso from this image, since the Venetian 
blinds would result in a mess of edges. The group of fold responses in (c) is due 
to the Venetian blinds in the background. Such a large set of parallel lines is 
unlikely to come from a picture of a torso, since it would require the torso to be 
unrealistically long, (d) A group that is an aliased version of the group in (c). 
Each group has quite high level semantics for segmenter output; in particular, 
groups represent image regions that could be clothing. 



m y initi lly int rpr t s groups o ol s — h n s lothing — high r 1 v 1 
r soiling shoul n 1 us to r j t th s groups s oming rom som thing 
oth r th n ol s in loth. 



4.2 Grouping Folds by Sampling 



n It rn tiv ppro h is to o t in groups whi h r s mpl s rom post rior 
on groups giv n im g t . his ppro h h s th virtu th t w o not n 



312 David Forsyth, John Iladdon, and Sergey Ioffe 




(e) (f) (g) (h) 



Fig. 5. Further examples of segmentations produced by our grouping process. 
The figures show groups of fold responses, for the torsional (b,f) and axial (d,h) 
cases. In some eases, more than one group should be fused to get the final extent 
of the torso — these groups are separated by circles in the image. In each case, 
there are a series of between 10 and 25 other groups, representing either aliasing 
effects, the Venetian blinds, or other accidental events. Each group could be a 
region of clothing; more high-level information is required to tell which is and 
which is not. 




Fig. 6. There are parallel folds that appear without clothing, too; (a) An image 
of an architectural curiosity, (b) One of four groups of folds found in the image. 
It is certainly expected that in images of man-made scenes, there will be a large 
number of nearly-parallel lines, which may be interpreted as groups of folds. Other 
cues should allow us to determine that this is not in fact clothing. 



to om up with t il physi 1 mo log rm nt u kling — pro ss 

ompli t y loth nisotropy t . simpl lik lihoo mo 1 n fitt to 

groups in r 1 im g s inst 




Finding Objects by Grouping Primitives 313 



s ri h group o ol s y oor in t syst m n s ri s o v ri 
1 s whi h s ri th s 1 o th ol s th ir ngl n th ir lo tion with 

r sp t to oor in t syst m. Iso in lu th h ng in ngl tw n 

j nt ol s (this n 1 s us to s ri st r sh p ol s). y x mining 

num r o groups in r 1 im g s w stim t pro ility istri ution on th 

p r m t rs o th oor in t syst m. his Hows us to s ri how lik ly 
group with thos p r m t rs is. Iso stim t th pro ility istri ution 
or in ivi u 1 ol s within group. 

h ol s r group y running r v rsi 1 jump rkov h in ont 

rlo Igorithm ( s in 17 . ol h s high lik lihoo o longing to 
p rti ul r group n ssignm nt o th ol to th t group shoul irly st 1 . 
n oth r wor s it will h v high pro ility in th st tion ry istri ution. h 
ssignm nts whi h pp r most r u ntly ov r 1 rg num r o it r tions r 
t k n to th orr t grouping, ropos 1 mov s or this group r r 

1. n w group, wo ol s whi h h v not pr viously n ssign to 
noth r group r om in to orm n w group. 

2. It two ol group. 

3. h ng th p r m t rs o group. 

4. ol to group, n un ssign ol is ssign to n xisting group, 

mov ol rom group 

6. h ng th group o ol . h ng th group ssignm nt o ol . 

t r s V r 1 thous n it r tions w o s rv th t th sp n s r 1 

tiv ly high proportion o its tim in rt in st t s. t k th grouping in 
th most popul r st t to th st grouping o ol s or th im g s. igur 7 

shows n im g n th most popul r grouping o ol s. ( h low r 1 v 1 ol 

fin r is not y t ro ust nough to g n r t r li 1 ol s so th put tiv ol s 
hrwrmrk yhn.) rill groups r t k n to unit n th g 

0 th figur is 1 rg ly ignor s sir . 

4.3 Choosing Primitives and Building Representations 

lothing is n int r sting s us it is not o vious th t ol s r th right 
primitiv to us . his r is s th st n r iffi ult u stion th t ny th ory 
s on primitiv s must r ss — how o w t rmin wh t is to 

primitiv ? s possi 1 It rn tiv to our urr nt ol fin r w h v n 

stu ying m h nism or t rmining wh t shoul primitiv ollowing th 

1 so 12. otin Irgstoimgsor gions showing r gions o ol s 
t th s m ori nt tion ns 1 . h r is omp rison s t ont ining non ol s 

th t r not sy to istinguish rom th ol s using ru m tho s ( .g. lin r 
1 ssifi r on prin ip 1 ompon nts). s m sur m nts w us sp ti 1 r 1 tions 

tw n hit r outputs or r son 1 s t o Hit rs t v ri ty o s Is. 

t k uni orm s mpl so su im g s rom h s t . 

h t sk is now to xplor th stru tur o th lothing s t with r sp t 

to th non lothing s t. o this y s tting up ision tr ; h ision 




314 David Forsyth, John Iladdon, and Sergey Ioffe 




Fig. 7. The Markov Chain Monte Carlo method can be used to group folds to- 
gether. (a) The original image, (b) Folds marked by hand, but grouped automat- 
ically. This is the most popular grouping of the image, after 10,000 iterations. 
Note that parallel folds are grouped together, and that the outline of the figure is 
largely ignored. 



ft mpts to split th s t t th 1 using n ntropy rit rion. h m sur m nt 
us is th V lu o th output o on hit r t on point — th hoi o hit r 

n point is giv n y th ntropy rit rion. h ppro h n thought o s 

sup rvis 1 rning o s gm nt tion — w r tr ining ision tr to s p r t 
win ows sso i t with o j ts to rom thos th t r not. 

1 V is — tot 1 o tw Iv 1 V s in th urr nt xp rim nts 
r pr s nt tion t hi s primitiv . n p rti ul r 1 
s ri s o hit r outputs t s ri s o points; t hi w h v n stim t o th 
r unyoosrv tion o this p tt rn giv n lothing n giv n no lothing. 
h r m ining t sk is to postpro ss th s t o primitiv s to r mov tr nsl tion 1 
r un n i s. 



split to s V r 1 
n th n us th 
is fin y 



5 Conclusions 

or r ognition syst ms to pr ti lly us ul w n syst mo r pr s nt 
tion th t n h n 1 r son 1 1 v 1 o str tion n th t n support s g 
m nt tion rom uit g n r 1 kgroun s. h s r uir m nts strongly sugg st 
r pr s nt tions in t rms o r 1 tions tw n primitiv s. h v shown th t 

using simpl primitiv th t is o viously onv ni nt n us ul it is possi 1 to 



Finding Objects by Grouping Primitives 315 




Fig. 8. A representation of the deeision tree used to find fold primitives. Each 
leaf contains a few windows representative of image windows classified at that 
level; on the left, clothing, and on the right, non-clothing. Below each leaf is the 
number of clothing and non-clothing windows that arrived at that leaf, out of a 
total of 128 in each category. 110 clothing and 2 non-clothing windows arrive at 
one leaf, strongly suggesting this combination of filter outputs is an appropriate 
clothing primitive. 




Fig. 9. Folds in clothing result from buckling and have quite characteristic shad- 
ing and spatial properties, which are linked to the configuration of the person. 

(a) shows the probability that an image window centered at each point contains 
a clothing primitive, using automatically defined primitives sketched in figure 8; 

(b) shows lines of primitives linked together using an extremisation criterion. 
Note that edges are in general not marked, and that the process is insensitive to 
changes in albedo; these properties are a result of the learning process. 






316 David Forsyth, John Iladdon, and Sergey Ioffe 



uil r 1 tion 1 r pr s nt tions th t r uit ff tiv t fin ing n k p opl 
n hors s. urth rmor w h v shown th t grouping pro ss th t fin s su h 
ss m li s n 1 rn rom t . h s r pr s nt tions r ru i lly limit 

y th ru primitiv s us . 

rimitiv s n not just stylis sh p s. h stylis pp r n o of s 
in lothing m ns th t w n stu y th pp r n o lothing in r son ly 

ff tiv w y. h s r sh ing primitiv s. Ithough it is urr ntly ifii ult to 

know how to hoos primitiv s th pro 1 m pp rs to st tisti 1 on its 
t tisti 1 rit ri pp r to 1 to sugg st promising hoi s o sh ing 

primitiv s rom im g t . 

Acknowledgement s 

h nks to tu rt uss 11 or o ussing our tt ntion on th ttr tions o 

s n in r n m tho . h is ussion o o y pi ns is s on j oint work with 

rg r t Ik. rious ompon nts o this r s r h w r rri out with th 

support o n N igit 1 Li r ry r nt ( 94 11334) n n N rut 

llowship to . . 

References 

[1] Y. Amit and D. Geman. Shape quantization and recognition with randomized 
trees. Neural computation, 9:1545-1588, 1997. 

[2] Y. Amit, D. Geman, and K. Wilder. Joint induction of shape features and tree 
classifiers. IEEE T. Pattern Analysis and Machine Intelligence, 19(11):1300-1305, 
1997. 

[3] D.C. Blair. Stairs redux: thoughts on the stairs evaluation, ten years after. J. 
American Soc. for Information Science, 47(l):4-22, 1996. 

[4] D.C. Blair and M.E. Maron. An evaluation of retrieval effectiveness for a full text 
document retrieval system. Comm. ACM, 28(3):289-299, 1985. 

[5] C.R. Calladine. Theory of shell structures. Cambridge University Press, 1983. 

[6] P.G.B. Enser. Query analysis in a visual information retrieval context. J. Docu- 
ment and Text Management, l(l):25-52, 1993. 

[7] O.D. Faugeras and M. Hebert. The representation, recognition, and locating of 
3-D objects. International Journal of Robotics Research, 5(3):27-52, Fall 1986. 

[8] M. M. Fleck, D. A. Forsyth, and C. Bregler. Finding naked people. In European 
Conference on Computer Vision 1996, Vol. II, pages 592-602, 1996. 

[9] M. Flickner, H. Sawhney, W. Niblack, and J. Ashley. Query by image and video 
content: the qbic system. Computer, 28(9):23-32, 1995. 

[10] D. A. Forsyth and M. M. Fleck. Identifying nude pictures. In IEEE Workshop 
on Applications of Computer Vision 1996, pages 103-108, 1996. 

[11] D.A. Forsyth and M.M. Fleck. Body plans. In IEEE Conf. on Computer Vision 
and Pattern Recognition, 1997. 

[12] D.A. Forsyth, M.M. Fleck, and C. Bregler. Finding naked people. In European 
Conference on Computer Vision, 1996. 




Finding Objects by Grouping Primitives 317 



[13] D.A. Forsyth, J. Malik, M.M. Fleck, H. Greenspan, T. Leung, S. Belongie, C. Car- 
son, and C. Bregler. Finding pictures of objects in large collections of images. In 
Proc. 2’nd International Workshop on Object Representation in Computer Vision, 
1996. 

[14] D.A. Forsyth, J.L. Mundy, A.P. Zisserman, C. Coelho, A. Heller, and C.A. Roth- 
well. Invariant descriptors for 3d object recognition and pose. PAMI, 13(10) :971- 
991, 1991. 

[15] Y. Freund and R.E. Schapire. Experiments with a new boosting algorithm. In 
Machine Learning - 13, 1996. 

[16] M.M. Gorkani and R.W. Picard. Texture orientation for sorting photos ”at a 
glance”. In Proceedings I APR International Conference on Pattern Recognition, 
pages 459-64, 1994. 

[17] P.J. Green. Reversible jump markov chain monte carlo computation and bayesian 
model determination. Biometrika, 82(4):711-732, 1995. 

[18] W.E.L. Crimson and T. Lozano-Perez. Localizing overlapping parts by searching 
the interpretation tree. IEEE Trans. Patt. Anal. Mach. Intell., 9(4):469-482, 1987. 

[19] J. Haddon and D.A. Forsyth. Shading primitives. In Int. Conf. on Computer 
Vision, 1997. 

[20] J. Haddon and D.A. Forsyth. Shape descriptions from shading primitives. In 
European Conference on Computer Vision, 1998. 

[21] A. Hampapur, A. Gupta, B. Horowitz, and Chiao-Fe Shu. Virage video engine. 
In Storage and Retrieval for Image and Video Databases V - Proceedings of the 
SPIE, volume 3022, pages 188-98, 1997. 

[22] D.P. Huttenlocher and S. Ullman. Object recognition using alignment. In Proc. 
Int. Conf. Comp. Vision, pages 102-111, London, U.K., June 1987. 

[23] S. Ioffe and D.A. Forsyth. Learning to find pictures of people. In In review — 
NIPS, 1998. 

[24] P. Lipson, W.E. L. Crimson, and P. Sinha. Configuration based scene classification 
and image indexing. In IEEE Conf. on Computer Vision and Pattern Recognition, 
pages 1007-13, 1997. 

[25] F. Liu and R.W. Picard. Periodicity, directionality, and randomness: Wold fea- 
tures for image modeling and retrieval. IEEE T. Pattern Analysis and Machine 
Intelligence, 18:722-33, 1996. 

[26] D. Lowe. Three-dimensional object recognition from single two-dimensional im- 
ages. Artificial Intelligence, 31(3):355-395, 1987. 

[27] T.P. Minka and R.W. Picard. Interactive learning with a ’’society of models”. 
Pattern Recognition, 30:465-481, 1997. 

[28] J.L. Mundy and A. Zisserman. Geometric Invariance in Computer Vision. MIT 
Press, Cambridge, Mass., 1992. 

[29] J.L. Mundy, A. Zisserman, and D. Forsyth. Applications of Invariance in Com- 
puter Vision, volume 825 of Lecture Notes in Computer Science. Springer- Verlag, 
1994. 

[30] V.E. Ogle and M. Stonebraker. Chabot: retrieval from a relational database of 
images. Computer, 28:40-8, 1995. 

[31] R.W. Picard, T. Kabir, and F. Liu. Real-time recognition with the entire brodatz 
texture database. In IEEE Conf. on Computer Vision and Pattern Recognition, 
pages 638-9, 1993. 

[32] J. Ponce. Straight homogeneous generalized cylinders: dillerential geometry and 
uniqueness results. Int. J. of Comp. Vision, 4(1):79-100, 1990. 




318 David Forsyth, John Iladdon, and Sergey Ioffe 



[33] J. Ponce, D. Chelberg, and W. Mann. Invariant properties of straight homoge- 
neous generalized cylinders and their contours. IEEE Trans. Patt. Anal. Mach. 
IntelL, ll(9):951-966, September 1989. 

[34] L.G. Roberts. Machine perception of three-dimensional solids. In J.T. Tippett 
et ah, editor. Optical and Electro- Optical Information Processing, pages 159-197. 
MIT Press, Cambridge, 1965. 

[35] G. Salton. Another look at automatic text retrieval systems. Comm. ACM, 
29(7):n649-657, 1986. 

[36] S. Santini and R. Jain. Similarity queries in image databases. In IEEE Conf. on 
Computer Vision and Pattern Recognition, pages 646-651, 1996. 

[37] M. Strieker and M.J. Swain. The capacity of color histogram indexing. In IEEE 
Conf. on Computer Vision and Pattern Recognition, pages 704-8, 1994. 

[38] M.J. Swain. Interactive indexing into image databases. In Storage and Retrieval 
for Image and Video Databases - Proceedings of the SPIE, volume 1908, pages 
95-103, 1993. 

[39] M.J. Swain and D.H. Ballard. Color indexing. Int. J. Computer Vision, 7(1):11- 
32, 1991. 

[40] D.W. Thompson and J.L. Mundy. Three-dimensional model matching from an 
unconstrained viewpoint. In IEEE Int. Conf. on Robotics and Automation, pages 
208-220, Raleigh, NC, April 1987. 

[41] unknown. Pose file, volume 1-7. Books Nippan, 1993-1996. A collection of pho- 
tographs of human models, annotated in Japanese. 

[42] D.A. White and R. Jain. Imagegrep: fast visual pattern matching in image 
databases. In Storage and Retrieval for Image and Video Databases V - Pro- 
ceedings of the SPIE, volume 3022, pages 96-107, 1997. 

[43] A. Zisserman, J.L. Mundy, D.A. Forsyth, J.S. Liu, N. Pillow, C.A. Rothwell, and 
S. Uteke. Class-based grouping in perspective images. In Int. Conf. on Computer 
Vision, 1995. 




Object Recognition with Gradient-Based 

Learning 



nn Le un tri k ner Leon ottou 


n 


oshu 


engio 


Sh nnon 100 S hulz ive e 


nk 


07701 


S 


yannSresearch . att . com 








http : //www. researcli. att . com/ 


~yann 







Abstract. in ing n pp op i te et of fe tu e i n e enti 1 p o lem 
in the e ign of h pe e ognition y tem hi p pe ttempt to how 
th t fo e ognizing imple o je t with high h pe v i ility u h 

h n w itten h te it i po i le n even v nt geou to fee the 

y tem i e tly with minim lly p o e e im ge n to ely on le ning 

to ext t the ight et of fe tu e onvolution 1 eu 1 etwo k e 

hown to e p ti ul ly well uite to thi t k e 1 o how th t the e 
netwo k n e u e to e ognize multiple o je t without e ui ing 
expli it egment tion of the o je t f om thei u oun ing he e on 
p t of the p pe p e ent the ph n fo me etwo k mo el whi h 

exten the ppli ility of g ient- e le ning to y tem th t u e 

g ph to ep e ent fe tu e o je t n thei om in tion 

1 Learning the Right Features 

he most ommonly epte mo el of p ttern re ognition is ompose of 
segmenter whose role is to extr to je ts of interest from their kgroun 
h n - r fte feature extractor th t g thers relev nt inform tion from the in- 
put n elimin tes irrelev nt v ri ilities n classifier whi h tegorizes the 
resulting fe ture represent tions (gener lly ve tors or strings of sym ols) into 
tegories here re three m jor metho s for 1 ssifi tion template matching 
m t hes the fe ture represent tion to set of 1 ss tempi tes; generative meth- 
ods use pro ility ensity mo el for e h 1 ss n pi k the 1 ss with the 
highest likelihoo of gener ting the fe ture represent tion; discriminative models 
ompute is rimin nt fun tion th t ire tly pro u es s ore for e h 1 ss 

ener five n is rimin five mo els re often estim te (le rne ) from tr in- 

ing s mples n 11 of these ppro hes the over 11 perform n e of the system is 
1 rgely etermine y the u lity of the segmenter n the fe ture extr tor 
e use they re h n - r fte the segmenter n fe ture extr tor often rely 
on simplifying ssumptions out the input t n nr relyt keinto ount 
11 the V ri ility of the re 1 worl n i e 1 solution to this pro lem is to fee 
the entire system with minim lly pro esse inputs (eg r w” pixel im ges) n 
tr in it from t so s to minimize n over 11 loss fun tion (whi h m ximizes 
given perform n e me sure) Keeping the prepro essing to minimum ensures 
th t no unre listi ssumption is m e out the t nfortun tely th t Iso 



D.A. Forsyth et al. (Eds.): Shape, Contour LNCS 1681, pp. 319—345, 1999. 
© Springer- Verlag Berlin Heidelberg 1999 




320 



nn e un et 1 



re uires to ome up with suit le le ruing r hite ture th t n h n le the 
high imension of the input (num er of pixels) the high egree of v ri ility 
ue to pose v ri tions or geometri istortions mong other things n the 
ne ess rily omplex non-line r rel tion etween the input n the output 

Gradient- Based Learning provi es fr mework in whi h to uil su h sys- 
tem he le rning m hine omputes fun tion F{Z^, W) where is 

the p-th input p ttern n W represents the olle tion of just le p r meters 
in the system he output ont ins s ores or pro ilities for e h tegory 
loss fun tion , F {W , Z^)) me sures the is rep n y etween 

the orre t” output for p ttern Z^ n the output pro u e y the system 

he ver ge loss fun tion Etrain{W) is the ver ge of the errors E^ over set of 
1 ele ex mples lie the tr ining set {Z^ , D^), ....{Z^ , D^) n the simplest 

setting the le rning pro lem onsists in fin ing the v lue of kF th t minimizes 

EtrainiW) 

king the loss fun tion differentiable with respe t to Vk ensures th t ef- 
fi ient gr ient- se non-line r optimiz tion metho s n e use to fin 
minimum o ensure glo 1 i erenti ility the system is uilt s fee -forw r 
network of mo ules n the simplest se e h mo ule omputes fun tion 
Xn Fn(Wn, Xn-i) where Xn is n o je t ( ve tor in the simplest se) 
representing the output of the mo ule is the ve tor of tun le (tr in le) 
p r meters in the mo ule ( su set of W) n is the mo ule’s input ( s 

well s the previous mo ule’s output) he input Xq to the first mo ule is the 
system’s input p ttern Z^ 

he m in i e of r ient- se Le rning whi h is simple extension of 
the well-known k-prop g tion neur 1 network le rning Igorithm is th t the 

o je tive fun tion n e efh iently minimize through gr ient es ent (or other 
more sophisti te non-line r optimiz tion metho s) e use the gr ient of 
E with respe t to VL n e effi iently ompute with kw r re urren e 
through the network of mo ules f the p rti 1 eriv tive of E^ with respe t to 
Xn is known then the p rti 1 eriv tives of E^ with respe t to Wn n Xn-i 
n e ompute using the following kw r re urren e 



dEP 

dW^ 

dEP 

dXn-i 



dFn 

dW 

dFn 

dX 



(Wn,Xn-l) 

(Wn,Xn-l) 



dEP 

dEP 



( 1 ) 



where ^^(kL„,X„_i) is the o i n of with respe t to kL ev lu te t 
the point (VL„,X„_i) n X„_i) is the o i n of with respe t 

to X he first e u tion omputes some terms of the gr ient of EP{W) while 
the se on e u tion prop g tes the p rti 1 gr ients kw r he i e n e 
trivi lly exten e to ny network of fun tion 1 mo ules ompletely rigorous 
eriv tion of the gr ient prop g tion pro e ure in the gener 1 se n e one 
using L gr nge fun tions 14 15 2 




je t e ognition with 



ient- 



e e nmg 



321 



2 Shape Recognition with Convolutional Neural 
Networks 

r ition 1 multi-1 yer neur 1 networks re spe i 1 se of the ove where the 

st tes Xn re fixe -size ve tors n where the mo ules re Item te 1 yers 

of m trix multipli tions (the weights) n omponent-wise sigmoi fun tions 
(the units) r ition 1 multil yer neur 1 nets where 11 the units in 1 yer re 
onne te to 11 the units in the next 1 yer n e use to re ognize r w (roughly 
size-norm lize n entere ) im ges ut there re pro lems 

irstly typi 1 im ges re 1 rge often with sever 1 hun re v ri les (pixels) 
fully- onne te network with s y 100 units in the first 1 yer woul Ire y 
ont in sever 1 10 000 weights Su h 1 rge num er of p r meters in re ses the 
p ity of the system n therefore re uires 1 rger tr ining set ut the m in 

efi ien y of unstru ture nets is th t they h ve no uilt-in inv ri n e with re- 

spe t to tr nsl tions s le or geometri istortions of the inputs m ges of 
o je ts n e pproxim tely size-norm lize n entere ut no su h prepro- 

essing n e perfe t his om ine with intrinsi within- 1 ss v ri ility will 
use V ri tions in the position of istin five fe tures in input o je ts n prin- 
iple fully- onne te network of sufii lent size oul le rn to pro u e outputs 
th t re inv ri nt with respe t to su h v ri tions owever le rning su h t sk 
woul pro ly result in multiple units with simil r weight p tterns positione 
t V rious lo tions in the input so s to ete t istin tive fe tures wherever they 
ppe r on the input Le rning these weight onfigur tions re uires very 1 rge 
num er of tr ining inst n es to over the sp e of possi le v ri tions n on- 
volution 1 networks es ri e elow the ro ustness to geometri istortions is 
utom ti lly o t ine y for ing the repli tion of weight onfigur tions ross 
sp e 

Se on ly efi ien y of fully- onne te r hite tures is th t the topology 
of the input is entirely ignore he input v ri les n e presente in ny 
(fixe ) or er without e ting the out ome of the tr ining n the ontr ry 
im ges h ve strong 2 lo 1 stru ture v ri les (pixels) th t re sp ti lly 
ne r y re highly orrel te Lo 1 orrel tions re the re sons for the well- 
known V nt ges of extr ting n om ining local fe tures efore re ognizing 
sp ti 1 or tempor 1 o je ts e use onfigur tions of neigh oring v ri les n 

e 1 ssifie into sm 11 num er of relev nt tegories (e g e ges orners ) 
Convolutional Networks for e the extr tion of lo 1 fe tures y restri ting the 
re eptive fiel s of hi en units to e lo 1 



2.1 Convolutional Networks 

onvolution 1 Networks om ine three r hite tur 1 i e s to ensure some egree 
of shift s le n istortion inv ri n e local receptive fields shared weights (or 
weight repli tion) n sp ti 1 sub-sampling typi 1 onvolution 1 network 
for re ognizing sh pes u e LeNet-5 is shown in figure 1 he input pi ne 
re eives im ges of o je ts th t re pproxim tely size-norm lize n entere 




322 



nn e un et 1 



C3: f. maps 16@10x10 

ni ' feati irp mane 0/i-f 




Convolutions Subsampling Convoiutions Subsampiing Fuil connection 

Fig. 1. r hite ture of LeNet-5 onvolution 1 Neur 1 Network here for igits 
re ognition h pi ne is fe ture m p i e set of units whose weights re 
onstr ine to e i enti 1 



h unit in 1 yer re eives inputs from set of units lo te in sm 11 neigh- 
orhoo in the previous 1 yer he i e of onne ting units to lo 1 re eptive 

fiel s on the input goes k to the e rly 60s n w s 1 rgely inspire y u el 

n iesel’s is overy of lo lly-sensitive orient tion-sele tive neurons in the 
t’s visu 1 system 9 Lo 1 onne tions h ve een use m ny times in neur 1 
mo els of visu 1 le rning 7 13 16 23 ith lo 1 re eptive fiel s neurons n 
le rn to extr t element ry visu 1 fe tures su h s oriente e ges en -points 
orners (or simil r fe tures in other sign Is su h s spee h spe trogr ms) hese 
fe tures re then om ine y the su se uent 1 yers in or er to ete t higher- 
or er fe tures s st te e rlier istortions or shifts of the input n use the 
position of s lient fe tures to v ry n ition element ry fe ture ete tors th t 
re useful on one p rt of the im ge re likely to e useful ross the entire im ge 
his knowle ge n e pplie y for ing set of units whose re eptive fiel s 
re lo te t i erent pi es on the im ge to h ve i enti 1 weight ve tors 
2 

16 nits in 1 yer re org nize in pi nes within whi h 11 the units sh re 

the s me set of weights he set of outputs of the units in su h pi ne is lie 

feature map nits in fe ture m p re 11 onstr ine to perform the s me 
oper tion on i erent p rts of the im ge omplete onvolution 1 1 yer is om- 
pose of sever 1 fe ture m ps (with i erent weight ve tors) so th t multiple 

fe tures n e extr te t e h lo tion on rete ex mple of this is the first 

1 yer of LeNet-5 shown in igure 1 nits in the first hi en 1 yer of LeNet-5 re 
org nize in 6 pi nes e h of whi h is fe ture m p unit in fe ture m p 
h s 25 inputs onne te to 5 y 5 re in the input lie the reeeptive field 

of the unit h unit h s 25 inputs n therefore 25 tr in le oeffi ients plus 

tr in le i s he re eptive fiel s of ontiguous units in fe ture m p re 
entere on orrespon ingly ontiguous units in the previous 1 yer herefore 
re eptive fiel s of neigh oring units overl p or ex mple in the first hi en 
1 yer of LeNet-5 the re eptive fiel s of horizont lly ontiguous units overl p y 
4 olumns n 5 rows s st te e rlier 11 the units in fe ture m p sh re the 
s me set of 25 weights n the s me i s so they ete t the s me fe ture t 11 
possi le lo tions on the input he other fe ture m ps in the 1 yer use i erent 






je t e ognition with 



ient- 



e e nmg 



323 



sets of weights n i ses there y extr ting i erent types of lo 1 fe tures 
n the se of LeNet-5 t e h input lo tion six i erent types of fe tures re 
extr te y six units in i enti 1 lo tions in the six fe ture ni ps se uenti 1 
implement tion of fe ture m p woul s n the input im ge with single unit 
th t h s lo 1 re eptive fiel n store the st tes of this unit t orrespon ing 
lo tions in the fe ture m p his oper tion is e uiv lent to onvolution fol- 
lowe y n itive i s n s u shing fun tion hen e the n me convolutional 
network he kernel of the onvolution is the set of onne tion weights use y 
the units in the fe ture m p n interesting property of onvolution 1 1 yers is 
th t if the input im ge is shifte the fe ture m p output will e shifte y the 
s me mount ut will e left un h nge otherwise his property is t the sis 
of the ro ustness of onvolution 1 networks to shifts n istortions of the input 

n e fe ture h s een ete te its ex t lo tion e omes less import nt 

nly its pproxim te position rel tive to other fe tures is relev nt sing h n - 

written igits s n ex mple on e we know th t the input im ge ont ins the 
en point of roughly horizont 1 segment in the upper left re orner in the 
upper right re n the en point of roughly verti 1 segment in the lower 
portion of the im ge we n tell the input im ge is 7 Not only is the pre ise 
position of e h of those fe tures irrelev nt for i entifying the p ttern it is po- 
tenti lly h rmful e use the positions re likely to v ry for i erent inst n es 
of the sh pe simple w y to re u e the pre ision with whi h the position of is- 
tin tive fe tures re en o e in fe ture m p is to re u e the sp ti 1 resolution 

of the fe ture m p his n e hieve with so- lie sub-sampling layers 

whi h performs lo 1 ver ging n su -s mpling re u ing the resolution 
of the fe ture m p n re u ing the sensitivity of the output to shifts n is- 
tortions he se on hi en 1 yer of LeNet-5 is su -s mpling 1 yer his 1 yer 
omprises six fe ture m ps one for e h fe ture m p in the previous 1 yer he 
re eptive fiel of e h unit is 2 y 2 re in the previous 1 yer’s orrespon ing 
fe ture m p h unit omputes the average of its four inputs multiplies it 
y tr in le oeffi ient s tr in le i s n p sses the result though 
sigmoi fun tion ontiguous units h ve non-overl pping ontiguous re eptive 
fiel s onse uently su -s mpling 1 yer fe ture m p h s h If the num er of 
rows n olumns s the fe ture m ps in the previous 1 yer he tr in le oef- 
fi ient n i s ontrol the e e t of the sigmoi non-line rity f the oeffi ient 
is sm 11 then the unit oper tes in u si-line r mo e n the su -s mpling 
1 yer merely lurs the input f the oeffi ient is 1 rge su -s mpling units n e 
seen s performing noisy ” or noisy N ” fun tion epen ing on the 
V lue of the i s Su essive 1 yers of onvolutions n su -s mpling re typi- 
lly Item te resulting in i-pyr mi ” t e hi yer the num er of fe ture 
m ps is in re se s the sp ti 1 resolution is e re se h unit in the thir 
hi en 1 yer in figure 1 m y h ve input onne tions from sever 1 fe ture m ps 
in the previous 1 yer he onvolution/su -s mpling om in tion inspire y 
u el n iesel’s notions of simple” n omplex” ells w s implement e 
in ukushim ’s Neo ognitron though no glo lly supervise le rning pro e- 
ure su h s k-prop g tion w s v il le then 1 rge egree of inv ri n e 




324 



nn e un et 1 



to geometri tr nsform tions of the input n e hieve with this progres- 
sive re u tion of sp ti 1 resolution ompens te y progressive in re se of the 

ri hness of the represent tion (the num er of fe ture m ps) 

Sin e 11 the weights re le rne with k-prop g tion onvolution 1 net- 
works n e seen s synthesizing their own fe ture extr tors n tuning them 
to the t sk t h n he weight sh ring te hni ue h s the interesting si e e e t 

of re u ing the num er of free p r meters there y re u ing the p ity” of 

the m hine n re u ing the g p etween test error n tr ining error 16 
he network in figure 1 ont ins 345 30 onne tions ut only 60 000 tr in le 
free p r meters e use of the weight sh ring 

ixe -size onvolution 1 Networks h ve een pplie to m ny ppli tions 
mong others h n writing re ognition 121s well s m hine-printe h r- 
ter re ognition 32 on-line h n writing re ognition 1 n f e re ognition 
12 ixe -size onvolution 1 networks th t sh re weights long single tempo- 
r 1 imension re known s ime- el y Neur 1 Networks ( NNs) n pplie 
wi ely in spee h pro essing n time-series pre i tion ri le-size onvolu- 
tion 1 networks whi h h ve ppli tions in o je t ete tion n lo tion re 
es ri e in se tion 3 



2.2 LeNet-5 

his se tion es ri es in more et il the r hite ture of LeNet-5 the onvo- 
lution 1 Neur 1 Network use in the experiments LeNet-5 omprises 7 1 yers 
not ounting the output 11 of whi h ont in tr in le p r meters (weights) 
he input is 32x32 pixel im ge nput sh pes shoul e signifi ntly sm Her 
th n th t (e g on the or er of 20x20 pixels) he re son is th t it is esir le 

th t potent! 1 istin five fe tures su h s en -points or orner n ppe r in 

the center of the re eptive fiel of the highest-level fe ture ete tors n LeNet-5 
the set of enters of the re eptive fiel s of the 1 st onvolution 1 1 yer ( 3 see 
elow) form 20x20 re in the enter of the 32x32 input he v lues of the 
input pixels re norm lize so th t the kgroun level (white) orrespon s 
to V lue of -0 1 n the foregroun ( 1 k) orrespon s to 1 175 his m kes 
the me n input roughly 0 n the v ri n e roughly 1 whi h eler tes le rning 
20 n the following onvolution 1 1 yers re 1 ele x su -s mpling 1 yers 

re 1 ele Sx n fully- onne te 1 yers re 1 ele x where x is the 1 yer 

in ex 

L yer 1 is onvolution 1 1 yer with 6 fe ture m ps h unit in e h 
fe ture m p is onne te to 5x5 neigh orhoo in the input he size of the 

fe ture m ps is 2 x2 whi h prevents onne tion from the input from f lling o 

the oun ry 1 ont ins 156 tr in le p r meters n 122 304 onne tions 
L yer S2 is su -s mpling 1 yer with 6 fe ture m ps of size 14x14 h unit 
in e h fe ture m p is onne te to 2x2 neigh orhoo in the orrespon ing 
fe ture m p in 1 he four inputs to unit in S2 re e then multiplie y 
tr in le oeffi lent n e to tr in le i s he result is p sse through 

sigmoi 1 fun tion he 2x2 re eptive fiel s re non-overl pping therefore 




je t e ognition with 



ient- 



e e nmg 



32 



fe ture m ps in S2 h ve h If the num er of rows n olumn s fe ture m ps in 

1 L yer S2 h s 12 tr in le p r meters n 5 0 onne tions 

L yer 3 is onvolution 1 1 yer with 16 fe ture m ps h unit in e h 
fe ture m p is onne te to sever 1 5x5 neigh orhoo s t i enti 1 lo tions in 

su set of S2’s fe ture m ps hy not onne t every S2 fe ture m p to every 

3 fe ture m p? he re son is twofol irst non- omplete onne tion s heme 
keeps the num er of onne tions within re son le oun s ore import ntly it 
for es re k of symmetry in the network i erent fe ture m ps re for e to 
extr t i erent (hopefully omplement ry) fe tures e use they get i erent 
sets of inputs he r tion le ehin the onne tion s heme is the following he 
first six 3 fe ture m ps t ke inputs from every ontiguous su sets of three 
fe ture m ps in S2 he next six t ke input from every ontiguous su set of 

four he next three t ke input from some is ontinuous su sets of four in lly 

the 1 st one t kes input from 11 S2 fe ture m ps he full onne tion t le is 
given in 19 L yer 3 h s 1 516 tr in le p r meters n 156 000 onne tions 

L yer S4 is su -s mpling 1 yer with 16 fe ture m ps of size 5x5 h unit 

in e h fe ture m p is onne te to 2x2 neigh orhoo in the orrespon ing 

fe ture m p in 3 in simil rwys lnS4L yer S4 h s 32 tr in le 

p r meters n 2 000 onne tions 

L yer 5 is onvolution 1 1 yer with 120 fe ture m ps h unit is on- 
ne te to 5x5 neigh orhoo on 11 16 of S4’s fe ture m ps ere e use the 
size of S4 is Iso 5x5 the size of 5’s fe ture m ps is 1x1 this mounts to 
full onne tion etween S4 n 5 5 is 1 ele s onvolution 1 1 yer in- 

ste of fully- onne te 1 yer e use if LeNet-5 input were m e igger with 

everything else kept onst nt the fe ture m p imension woul e 1 rger th n 
1x1 his pro ess of yn mi lly in re sing the size of onvolution 1 network 

is es ri e in the se tion Se tion 3 L yer 5 h s 4 120 tr in le onne tions 

L yer 6 ont ins 4 units (the re son for this num er omes from the esign 
of the output 1 yer expl ine 1 ter) n is fully onne te to 5 t h s 10 164 
tr in le p r meters 

s in 1 ssi 1 neur 1 networks units in 1 yers up to 6 ompute ot pro u t 
etween their input ve tor n their weight ve tor to whi h i s is e 
his weighte sum is then p sse through s le hyper oli t ngent fun tion 
to pro u e the st te of the unit 

in lly the output 1 yer is ompose of u li e n i 1 sis un tion 
units ( ) one for e h 1 ss with 4 inputs eh h output unit 

omputes the u li e n ist n e etween its input ve tor n its p r meter 
ve tor he output of p rti ul r n e interprete s pen Ity term 

me suring the fit etween the input p ttern n mo el of the 1 ss sso i te 
with the iven n input p ttern the loss fun tion shoul e esigne so 

s to get the onfigur tion of 6 s lose s possi le to the p r meter ve tor 
of the th t orrespon s to the p ttern’s esire 1 ss he p r meter ve - 
tors of these units were hosen y h n n kept fixe ( t le st initi lly) he 
omponents of those p r meters ve tors were set to -1 or -|-1 to pre etermine 




326 



nn e un et 1 



V lues he p r meter ve tors of the s pi y the role of t rget ve tors for 
1 yer 6 

he simplest output loss fun tion th t n e use with the ove network 
is 

1 P 

E{W) -Y^yu.{Z^^W) ( 2 ) 

p=i 

where yop is the output of the -Dp-th unit i e the one th t orrespon s 
to the orre t 1 ss of input p ttern he tu 1 loss fun tion use in our 

experiments h s ition 1 term to m ke it more is rimin tive ore et ils 
re V il le in 19 omputing the gr ient of the loss fun tion with respe t 
to 11 the weights in 11 the 1 yers of the onvolution 1 network is one with 
k-prop g tion he st n r Igorithm must e slightly mo ifie to t ke 
ount of the weight sh ring n e sy w y to implement it is to first ompute 
the p rti 1 eriv tives of the loss fun tion with respe t to e h connection s 
if the network were onvention 1 multi-1 yer network without weight sh ring 
hen the p rti 1 eriv tives of 11 the onne tions th t sh re s me p r meter 
re e to form the eriv tive with respe t to th t p r meter 

2.3 An Example: Recognizing Handwritten Digits 

e ognizing in ivi u 1 igits is n ex client en hm rk for omp ring sh pe 
re ognition metho s his omp r tive stu y on entr tes on ptive metho s 
th t oper te ire tly on size-norm lize im ges n written igit re ognition 
m y seem little simplisti when one’s interest is omputer ision ut the 

simpli ity is only pp rent n the pro lems to solve re essenti lly the s me 

s with ny 2 sh pe re ognition only there is un nt tr ining t v il le 
n the intr - 1 ss sh pe v ri ility is onsi er ly 1 rger th n with ny rigi 
o je t re ognition pro lem 

he t se use to tr in n test the systems es ri e in this p per w s 

onstru te from the N S ’s Spe i 1 t se 3 n 1 ont ining in ry im ges 

of h n written igits rom these we uilt t se lie N S whi h on- 

t ins 60 000 tr ining s mples (h If from S 1 h If from S3) n 10 000 test 

im ges (h If from S 1 n h If from S 3) he origin 1 1 k n white ( ilevel) 

im ges were size norm lize to fit in 20x20 pixel ox while preserving their 

spe t r tio he resulting im ges ont in grey levels s result of nti- li se 

res mpling hree versions of the t se were use n the first version the 

im ges were entere in 2 x2 im ge y omputing the enter of m ss of the 
pixels n tr nsl ting the im ge so s to position this point t the enter of 
the 2 x2 fiel n some inst n es this 2 x2 fiel w s exten e to 32x32 with 
kgroun pixels his version of the t se will e referre to s the regular 
t se n the se on version of the t se (referre to s the deslanted 
version) the h r ter im ges were esl nte using the moments of inerti of the 

1 k pixels n roppe own to 20x20 pixels im ges n the thir version of the 

t se use in some e rly experiments the im ges were re u e to 16x16 pix- 
els he regul r t se is v il le t http: //www. research, att.com/yann 




je t e ognition with 



ient- 



e e nmg 



327 



J 




1 


/ 


7 


7 




e 


it 


\ 






5 


7 


f 




a 




? 


S’ 




T 


7 


7 




/ 


A. 


% 


V 


iT 


H 


f 


1 


f 


O 


i 


f 




9 


*4 






1 


S' 




H 


/ 


5' 


i 


0 


7 


S 




Z 








1 




V 








a 


& 








t 


O 


b 


A 




% 


0 


7' 








7 


o 


1 


4- 


if^ 




i, 








3 


r 


/ 


a. 






(d 






(jp 


/ 


Fig. 2. 


y. 


: mples from 


1 the test 


set 



tr ining p tterns (right) 



□SEEE0QEH0 

0000000000 

0000000000 

0000000000 

0000000000 

0000000000 

00000[Z]0000 

0000^00000 

0000000000 



n ex triples of istortions of ten 



2.4 Results and Comparison with Other Classifiers 

Sever 1 versions of LeNet-5 were tr ine on the regul r t se with typi lly 
20 iter tions through the entire tr ining t he test error r te st ilizes f- 
ter roun 10 p sses through the tr ining set t 0 95% he error r te on the 
tr ining set re hes 0 35% fter 19 p sses he influen e of the tr ining set size 
w s me sure y tr ining the network with 15 000 30 000 n 60 000 ex m- 
ples he results m e it le r th t ition 1 tr ining t woul e enefl i 1 
n nother set of experiments we rtili i lly gener te more tr ining ex mples 
y r n omly istorting the origin 1 tr ining im ges he in re se tr ining set 
w s ompose of the 60 000 origin 1 p tterns plus 540 000 inst n es of istorte 
p tterns with r n omly pi ke istortion p r meters he istortions were om- 
in tions of the following pi n r fline tr nsform tions horizont 1 n verti 1 
tr nsl tions s ling s ueezing (simult neous horizont 1 ompression n verti- 
1 elong tion or the reverse) n horizont 1 she ring igure 2 shows ex mples 
of istorte p tterns use for tr ining hen istorte t w s use for tr in- 
ing the test error r te roppe to 0 % (from 0 95% without eform tion) Some 
of the mis 1 ssifie ex mples re genuinely m iguous ut sever 1 re perfe tly 
i entifi le y hum ns Ithough they re written in n un er-represente style 
his shows th t further improvements re to e expe te with more tr ining 
t 

or the s ke of omp rison v riety of other tr in le 1 ssiflers w s tr ine 
n teste on the s me t se he error r tes on the test set for the v r- 
ious metho s re shown in figure 3 he experiments in lu e the following 
metho s linear classification with 10 two-w y 1 ssiflers tr ine to 1 ssify one 
1 ss from the other nine; pairwise linear elassifier with 45 two-w y 1 ssiflers 
tr ine to 1 ssify one 1 ss versus one other followe y voting me h nism; 
K-Nearest Neighbor classifiers with simple u li e n ist n e on pixel im ges; 
40 -dimension prineipal component analysis followed by degree 2 polynomial clas- 
sifier, radial basis funetion network with 1000 ussi n tr ine with K- 




328 



nn e un et 1 



Linear 
[deslant] Linear 
Pairwise 



K-NN Euclidean 
[deslant] K-NN Euclidean 
40 PCA + quadratic 
1000 RBF + linear 
[16x16] Tangent Distance 
SVM poly 4 
RS-SVM poly 5 
[dist]V-SVMpoly9 

28x26-300-10 
[dist] 28x28-300-10 
(deslant) 20x20-300-10 
28x28-1000-10 
[dist] 28x28-1000-10 
28x28-300-100-10 
[dist] 28x28-300-100-10 
28x28-500-150-10 
(dist) 28x28-500-150-10 

[16x16] LeNet-1 
LeNet-4 
LeNel-4 / Local 
LeNet-4 1 K-NN 
LeNet-5 
[dist] LeNet-5 
[dist] Boosted LeNet-4 




Fig. 3. rror r te on the test set (%) for v rious 1 ssifi tion metho s esl nt 
in i tes th t the 1 ssifier w s tr ine n teste on the esl nte version of 
the t se ist in i tes th t the tr ining set w s ugmente with rtifi- 

i lly istorte ex mples 16x16 in i tes th t the system use the 16x16 pixel 
im ges he un ert inty in the note error r tes is out 0 1% 



me ns per 1 ss n followe y line r 1 ssifier; Tangent Distance classifier 
ne rest-neigh or 1 ssifier where the ist n e is m e inv ri nt to sm 11 geo- 
metri istortions y proje ting the p ttern onto line r pproxim tions of the 
m nifol s gener te y istorting the prototypes; Support Vector Machines of 
V rious types (regul r S re u e -set S virtu 1 S ) using polynomi 1 
kernels; fully connected neural nets with one or two hi en 1 yers n v rious 
num ers of hi en units; LeNet-1 sm 11 onvolution 1 neur 1 net with only 
2600 free p r meters n 100 000 onne tions; LeNet-f onvolution 1 neur 1 
net with 17 000 free p r meters n 260 000 onne tion simil r to ut slightly 
i erent from LeNet-5; Boosted LeNetf 1 ssifier o t ine y voting three 
inst n es of LeNet-4 tr ine on i erent su sets of the t se; n fin lly 
LeNet-5 

on erning fully- onne te neur 1 networks it rem ins somewh t of mys- 
tery th t unstru ture neur 1 nets with su h 1 rge num er of free p r meters 
m n ge to hieve re son le perform n e e onje ture th t the yn mi s 
of gr ient es ent le rning in multil yer nets h s self-regul riz tion” e e t 
e use the origin of weight sp e is s le point th t is ttr tive in 1- 
most every ire tion the weights inv ri ly shrink uring the first few epo hs 
Sm 11 weights use the sigmoi s to oper te in the u si-line r region m king 





je t e ognition with 



ient- 



e e nmg 



329 



the network essenti lly e uiv lent to low- p ity single-1 yer network s 
the le rning pro ee s the weights grow whi h progressively in re ses the e e - 
tive p ity of the network his seems to e n Imost perfe t if fortuitous 
implement tion of pnik’s Stru tur 1 isk inimiz tion” prin iple 31 

he Support e tor hine 31 h s ex ellent ur y whi h is most re- 
m rk le e use unlike the other high perform n e 1 ssifiers it oes not in- 
lu e a priori knowle ge out the pro lem 4 n f t this 1 ssifier woul o 
just s well if the im ge pixels were permute with fixe m pping n lost 
their pi tori 1 stru ture owever re hing levels of perform n e omp r le to 
the onvolution 1 Neur 1 Networks n only e one t onsi er le expense in 
memory n omput tion 1 re uirements he omput tion 1 re uirements of 
urges’s re u e -set S re within f tor of two of LeNet-5 n the error 
r te is very lose mprovements of those results re expe te s the te hni ue 
is rel tively new 

ooste LeNet-4 performe est hieving s ore of 0 7% losely followe 
y LeNet-5 t 0 % ooste LeNet-4 6 is se on theoreti 1 work y 
S h pire 29 hree LeNet-4s re om ine the first one is tr ine the usu 1 

w y the se on one is tr ine on p tterns th t re filtere y the first net so 

th t the se on m hine sees mix of p tterns 50% of whi h the first net got 
right n 50% of whi h it got wrong in lly the thir net is tr ine on new 

p tterns on whi h the first n the se on nets is gree uring testing the 

outputs of the three nets re simply e 

hen plenty of t is v il le m ny met ho s n tt in respe t le - 
ur y omp re to other metho s onvolution 1 neur 1 nets o er not only 
the est ur y ut Iso goo spee low memory re uirements n ex ellent 
ro ustness s is usse elow 

2.5 Invariance and Noise Resistance 

hile fully inv ri nt re ognition of omplex sh pes is still n elusive go 1 it 
seems th t onvolution 1 networks e use of their r hite ture o er p rti 1 
nswer to the pro lem of inv ri n e or ro ustness with respe t to istortions 

V rying position s le n orient tion s well s intrinsi 1 ss v ri ility ig- 
ure 4 shows sever 1 ex mples of unusu 1 n istorte h r ters th t re or- 
re tly re ognize y LeNet-5 or these experiments the tr ining s mples were 

rtifi i lly istorte using r n om pi n r ffine tr nsform tions n the pixels 
in the tr ining im ges were r n omly flippe with pro ility 0 1 to in re se 
the noise resist n e he top row in the figure shows the ro ustness to size n 
orient tion v ri tions t is estim te th t ur te re ognition o urs for s le 

V ri tions up to out f tor of 2 verti 1 shift v ri tions of plus or minus 
out h If the height of the h r ter n rot tions up to plus or minus 30 

egrees hile the h r ters re istorte uring tr ining it seems th t the ro- 
ustness of the network su sists for istortions th t re signifi ntly 1 rger th n 
the ones use uring tr ining igure 4 in lu es ex mples of h r ters written 
in very unusu 1 styles Nee less to s y there re no su h ex mples in the tr ining 
set Nevertheless the network 1 ssifies them orre tly whi h seems to suggest 




330 



nn e un et 1 




Cl M Cl 




Fig. 4. X mples of unusu 1 istorte n noisy h r ters orre tly re ognize 
y LeNet-5 he grey-level of the output 1 el represents the pen Ity (lighter for 
higher pen Ities) 



th t the fe tures th t h ve een le rne h ve some egree of gener lity L stly 
figure 4 in lu es ex mples th t emonstr tes LeNet-5’s ro ustness to extremely 
high levels of stru ture noise n ling these im ges with tr ition 1 segmen- 

t tion n fe ture extr tion te hni ues woul pose insurmount le pro lems 
ven though the only noise use uring tr ining w s r n om pixel flipping it 
seems th t the network n elimin te the verse e e ts of non-sensi 1 ut 

stru ture m rks from im ges su h s the 3 n the in the se on row his 
emonstr tes somewh t puzzling ility of su h networks to perform (if im- 

pli itly) kin of element ry feature binding solely through fee -forw r line r 
om in tions n sigmoi fun tions 

nim te ex mples of LeNet-5 in tion re v il le on the nternet t 
http : //www . research. att . com/~yann 
















je t e ognition with 



ient- 



e e nmg 



331 



3 Multiple Object Recognition with Space Displacement 
Neural Networks 

m jor on eptu 1 pro lem in vision n p ttern re ognition is how to re ognize 
in ivi u 1 o je ts when those o je ts nnot e e sily segmente out of their 
surroun ing n gener 1 this poses the pro lem of fe ture in ing how to i entity 
n in together fe tures th t elong to single o je t while suppressing 
fe tures th t elong to the kgroun or to other o je ts he ommon wis om 
is th t ex ept in the simplest se one nnot i entity n in together the 
fe tures of n o je t unless one knows wh to je t to look for 

n h n writing re ognition the pro lem is to sep r te hr ter from its 
neigh ors given th t the neigh ors n tou h it or overl p with it he most 
ommon solution is lie heuristi over-segment tion” t onsists in gener t- 
ing 1 rge num er of potenti 1 uts etween h r ters using heuristi im ge 
n lysis te hni ues n i te h r ters re forme y om ining ontiguous 
segments in multiple w ys he n i te h r ters re then sent to the re og- 
nizer for 1 ssifi tion n s oring simple gr ph-se r h te hni ue then fin s 
the onsistent se uen e of h r ter n i tes with the est over 11 s ore 

here is simple Item tive to expli itly segmenting im ges of h r ter 
strings using heuristi s he i e is to sweep re ognizer ross 11 possi le 
lo tions on n im ge of the entire wor or string whose height h s een nor- 
m lize ith this te hni ue no segment tion heuristi s is re uire owever 
there re pro lems with this ppro h irst the metho is in gener 1 uite 
expensive he re ognizer must e pplie t every possi le lo tion on the in- 
put or t le st t 1 rge enough su set of lo tions so th t mis lignments of 
h r ters in the fiel of view of the re ognizers re sm 11 enough to h ve no 
e e t on the error r te Se on when the re ognizer is entere on hr ter 
to e re ognize the neigh ors of the enter h r ter will e present in the 
fiel of view of the re ognizer possi ly tou hing the enter h r ter herefore 
the re ognizer must e le to orre tly re ognize the h r ter in the enter 
of its input fiel even if neigh oring h r ters re very lose to or tou hing 
the entr 1 h r ter hir wor or h r ter string nnot e perfe tly size 
norm lize n ivi u 1 hr ters within string m y h ve wi ely v rying sizes 
n seline positions herefore the re ognizer must e very ro ust to shifts 
n size v ri tions 

hese three pro lems re eleg ntly ir umvente if onvolution 1 network 
is repli te over the input fiel irst of 11 s shown in the previous se tion 
onvolution 1 neur 1 networks re very ro ust to shifts n s le v ri tions of 
the input im ge s well s to noise n extr neous m rks in the input hese 
properties t ke re of the 1 tter two pro lems mentione in the previous p r - 
gr ph Se on onvolution 1 networks provi e r sti s ving in omput tion 1 
re uirement when repli te over 1 rge input fiel s repli te onvolution 1 
network Iso lie Space Displacement Neural Network or S NN 22 is 
shown in igure 5 hile s nning re ognizer n e prohi itively expensive 
in gener 1 onvolution 1 networks n e s nne or repli te very efh iently 
over 1 rge v ri le-size input fiel s onsi er one inst n e of onvolution 1 




332 



nn e un et 1 



1 




□ 


- 




[ 


1 




1 









Fig. 5. Sp e ispl ement Neur 1 Network is onvolution 1 network th t 
h s een repli te over wi e input fiel 



net n its alter ego t ne r y lo tion e use of the onvolution 1 n ture 
of the network units in the two inst n es th t look t i enti 1 lo tions on the 
input h ve i enti 1 outputs therefore their st tes o not nee to e ompute 
twi e nly thin sli e” of new st tes th t re not sh re y the two network 
inst n es nee s to e re ompute hen 11 the sli es re put together the re- 
sult is simply 1 rger onvolution 1 network whose stru ture is i enti 1 to the 

origin 1 network ex ept th t the fe ture m ps re 1 rger in the horizont 1 i- 

mension n other wor s repli ting onvolution 1 network n e one simply 
y in re sing the size of the fiel s over whi h the onvolutions re performe 
n y repli ting the output 1 yer or ingly he output 1 yer e e tively 
e omes onvolution 1 1 yer n output whose re eptive fiel is entere on 
n element ry o je t will pro u e the 1 ss of this o je t while n in- etween 

output m y in i te no h r ter or ont in ru ish he outputs n e inter- 

prete s evi en es for the presen e of o je ts t 11 possi le positions in the 
input fiel 

he S NN r hite ture seems p rti ul rly ttr tive for re ognizing ursive 
h n writing where no o vious segment tion heuristi s exist Ithough the i e 
of S NN is uite ol 10 22 n very ttr tive y its simpli ity it h s not 
gener te wi e interest until re ently e use of the enormous em n s it puts 
on the re ognizer 




je t e ognition with 



ient- 



e e nmg 



333 



C1 



C3 CS 







m 9 




ERST 










AMI 














rs;-i 


! 

s 



23^5 

Cam pose 4 VitGfbi 

2 33 4 5 




Answer 



SDNM 

OutpUl 

FS 



Input 



Fig. 6. n ex mple of multiple h r ter re ognition with S NN ith S NN 
no expli it segment tion is performe 



3.1 Interpreting the Output of an SDNN 

he output of horizont lly repli te S NN is se uen e of ve tors whi h 
en o e the likelihoo s pen Ities or s ores of fin ing h r ter of p rti ul r 

1 ss 1 el t the orrespon ing lo tion in the input post-pro essor is re uire 

to pull out the est possi le 1 el se uen e from this ve tor se uen e n ex mple 
of S NN output is shown in igure 6 ery often in ivi u 1 hr ters re 
spotte y sever 1 neigh oring inst n es of the re ognizer onse uen e of 
the ro ustness of the re ognizer to horizont 1 tr nsl tions Iso uite often 
h r ters re erroneously ete te y re ognizer inst n es th t see only pie e 
of hr ter or ex mple re ognizer inst n e th t only sees the right thir of 
4” might output the 1 el 1 ow n we elimin te those extr neous h r ters 
from the output se uen e n pull-out the est interpret tion? his n e one 
with simple weighte finite st te m hine he se uen e of ve tors pro u e y 

the S NN is first turne into line r gr ph onstru te s follows h ve tor in 

the output se uen e is tr nsforme into un le of r s with ommon sour e 
no e n t rget no e hr ont ins one of the possi le h r ter 1 els 
together with its orrespon ing pen Ity h un le ont ins n it ion 1 

r e ring the none of the ove”l el with pen Ity hese un les re 
on ten te in the or er of the ve tor se uen e (the t rget no e of un le 
e omes the sour e no e of the next un le) h p th in this gr ph is possi le 
interpret tion of the input gr mm r is then onstru te s weighte finite- 
st te m hine th t ont ins mo el for e h h r ter he gr mm r ensures 
for ex mple th t neigh oring h r ters must e sep r te y none of the 
ove” 1 el (white sp e) n th t su essive o urren es of the s me 1 el 




334 



nn e un et 1 



iT> 



540 

T 



(J 



f 1 


rm 




1 1 1 44] 


1 1 


ULiiuuu^jijy 








Inpul 



Answer — 

ECNtt 

DUlpul 

F6 — 



678 



I 3 ^ 4 — I 

'3 55 ] [ 4 ' 



BOTOTBaa 



Fig. 7. n S NN pplie to noisy im ge of igit string he igits shown in 
the S NN output represent the winning 1 ss 1 els with lighter grey level for 
high-pen Ity nswers 



re pro ly pro u e y single input hr ter he gr mm r n the line r 

gr ph re then composed ( gr ph oper tion simil r to tensor pro u t) he 

ompose gr ph ont ins 11 the p ths of the line r gr ph th t h ppen to e 

gr mm ti lly orre t iter i Igorithm n then e use to fin the p th 
with the sm llest over 11 pen Ity 

3.2 Experiments with SDNN 

n series of experiments LeNet-5 w s tr ine with the go 1 of eing repli te 
into n S NN so s to re ognize multiple h r ters without segment tions he 

t w s gener te from the previously es ri e NS set s follows r ining 
im ges were ompose of entr 1 h r ter fl nke y two si e hr ters 

pi ke t r n om in the tr ining set he sep r tion etween the oun ing 
oxes of the h r ters were hosen t r n om etween -1 n 4 pixels n other 
inst n es no entr 1 h r ter w s present in whi h se the esire output 

of the network w s the 1 nk sp e 1 ss n ition tr ining im ges were 

egr e y r n omly flipping the pixels with pro ility 0 1 

igures 6 n 7 show few ex mples of su essful re ognitions of multi- 
ple h r ters y the LeNet-5 S NN St n r te hni ues se on euristi 

ver-Segment tion woul likely f il on most of those ex mples he ro ustness 








je t e ognition with 



ient- 



e e nmg 



33 



of the network to s le n verti 1 position v ri tions Hows it to re ognize 
h r ters in su h strings ore import ntly it seems th t the network is le 
to in ivi u lly re ognize the h r ters even when there is signifi nt overl p 
with the neigh ors t is Iso le to orre tly group is onne te pie es of ink 
th t form h r ters s exemplifie in the upper h If of the figure n the top 
left ex mple the 4 n the 0 re more onne te to e h other th n they re 
onne te with themselves yet the system orre tly i entities the 4 n the 0 s 
sep r te o je ts he top right ex mple is interesting for sever 1 re sons irst 
the system orre tly i entities the three in ivi u 1 1” Se on the left h If n 
right h If of the is onne te 4 re orre tly groupe even though no simple 
proximity riterion oul e i e to sso i te the left h If of the 4 to the verti 1 
r on its left or on its right he right h If of the 4 oes use the ppe r n e of 

n erroneous 1” on the S NN output ut this 1” is remove y the gr mm r 

whi h prevents i erent non- 1 nk h r ters from ppe ring on ontiguous out- 
puts he ottom left ex mple emonstr tes th t extr neous m rks th t o not 
elong to i entifi le h r ters re suppresse even though they m y onne t 
genuine h r ters to e h other he lower right ex mple shows the om ine 
ro ustness to h r ter overl ps verti 1 shifts size v ri tions n noise 

Sever 1 uthors h ve rgue th t inv ri n e n fe ture in ing for multi- 
ple o je t re ognition re uires spe ifi me h nisms involving fee k expli it 
swit hing evi es (3-w y multipli tive onne tions) 11 o je t- entere repre- 
sent tions gr ph m t hing me h nisms or gener tive mo els th t ttempt to 
simult neously extr t the pose n the tegory of the o je ts t is somewh t 

is on erting to o serve th t the ove S NN seems to solve” the fe ture in - 

ing pro lem 1 eit p rti lly n in restri te ontext even though it possesses 
no uilt in m hinery to o it expli itly f nothing else these experiments show 
th t purely fee -forw r numeri 1” multi-1 yer systems with fixe r hi- 
te ture n emul te fun tions th t ppe r om in tori In re u lit tively 
mu h more omplex th n nti ip te y most (in lu ing the uthors) 

Sever 1 short nim tions of the LeNet-5 S NN in lu ing some with h r - 
ters th t move on top of e h other n e viewe t 
http : / /www. research. att . com/ yann 

3.3 Face Detection and Spotting with SDNN 

n interesting ppli tion of S NNs is o je t ete tion n spotting he in- 
V ri n e properties of onvolution 1 Networks om ine with the efR ien y with 
whi h they n e repli te over 1 rge fiel s suggest th t they n e use for 

rute for e” o je t spotting n ete tion in 1 rge im ges he m in i e is 

to tr in single onvolution 1 Network to istinguish im ges of the o je t of 
interest from im ges present in the kgroun n e tr ine the network is 
repli te so s to over the entire im ge to e n lyze there y forming 

two- imension 1 Sp e ispl ement Neur 1 Network he output of the S NN 

is two- imension 1 pi ne in whi h the most tiv te units in i te the pres- 
en e of the o je t of interest in the orrespon ing re eptive fiel Sin e the size 
of the o je ts to e ete te within the im ge re unknown the im ge n e 




336 



nn e un et 1 



presente to the network t multiple resolutions n the results t multiple 
resolutions om ine he i e h s een pplie to f e lo tion 30 ress 

lo k lo tion on envelopes 33 n h n tr king in vi eo 24 

o illustr te the metho we will onsi er the se of f e ete tion in im- 
ges s es ri e in 30 irst im ges ont ining f es t v rious s les re 
olle te hose im ges re liltere through zero-me n L pi in filter so s 
to remove v ri tions in glo 1 illumin tion n 1 rge-s le illumin tion gr i- 
ents hen tr ining s mples of f es n non-f es re m nu lly extr te from 
these im ges he f e su -im ges re then size norm lize so th t the height 
of the entire f e is pproxim tely 20 pixels while keeping f irly 1 rge v ri tions 

(within f tor of two) he s le of kgroun su -im ges re pi ke t r n- 

om single onvolution 1 network is tr ine on those s mples to 1 ssify f e 
su -im ges from non-f e su -im ges hen s ene im ge is to e n lyze it 

is first filtere through the L pi in filter n su -s mple y r tios th t re 

su essive powers of the s u re root of 2 he network is repli te over e h of 
the im ges t e h resolution simple voting te hni ue is use to om ine the 
results from multiple resolutions 

ore re ently some uthors h ve use Neur 1 Networks or other 1 ssifiers 
su h s Support e tor hines for f e ete tion with gre t su ess 27 25 
heir systems re somewh t simil r to the one es ri e ove in lu ing the 
i e of presenting the im ge to the network t multiple s les ut sin e those 
systems o not use onvolution 1 Networks they nnot t ke v nt ge of the 
spee up es ri e here n h ve to rely on other te hni ues su h s pre-filtering 
n re 1-time tr king to keep the omput tion 1 re uirement within re son le 
limits n ition e use those 1 ssifiers re mu h less inv ri nt to s le 

V ri tions th n onvolution 1 Networks it is ne ess ry to use 1 rge num er 

multis le im ges with finely-sp e s les 

4 Graph Transformer Networks 

espite the pp rent ility of the systems es ri e in the previous se tions 

to solve om in tori 1 pro lems with non- om in tori 1 me ns there re sit- 

u tions where the nee for compositionality n om in tori 1 se r hes is in- 
es p le goo ex mple is 1 ngu ge mo eling n more gener lly mo els t h t 
involve finite-st te gr mm rs weighte finite-st te m hines or other gr ph- 
se knowle ge represent tions su h s finite-st te tr ns u ers he m in point 

of this se tion is to show th t gr ient- se le rning te hni ues n e exten e 

to situ tions where those mo els re use 

t is e sy to show th t the mo ul r gr ient- se le rning mo el presente 
in se tion Ine pplie to networks of mo ules whose st te v ri les Xn re 
gr phs with numeri 1 inform tion tt he to the r s (s 1 rs ve tors et ) 
r ther th n fixe -size ve tors here re two m in on itions for this irst 
the mo ules must pro u e the v lues on the output gr phs from the v lues 
on the input gr phs through i erenti le fun tions Se on the over 11 loss 

fun tion shoul e ontinuous n i erenti le almost everywhere with respe t 




je t e ognition with 



ient- 



e e nmg 



337 



to the p r meters Networks of gr ph-m nipul ting mo ules re lie Graph 
Transformer Networks 3 19 

4.1 Word Recognition with a Graph Transformer Network 

hough the Sp e ispl ement Neur 1 Net metho presente in the previ- 
ous se tion is very promising for wor re ognition ppli tions the more tr - 

ition 1 metho ( n so f r still the most evelope ) is lie heuristi over- 
segment tion ith this metho s the wor is segmente into n i te h r- 

ters using heuristi im ge n lysis te hni ues nfortun tely it is Imost im- 
possi le to evise te hni ues th t will inf Hi ly segment n tur lly written wor s 
into well forme h r ters his se tion n the next es ri e in et il sim- 
ple ex mple of N for re ing wor s he metho n rely on gr lent- se 
le rning to voi s the expensive n unreli le t sk of m nu lly segmenting n 
h n -truthing t se so s to tr in the re ognizer on in ivi u 1 hr ters 

Segmentation. iven wor num er of n i te uts re gener te with 

heuristi metho s he ut gener tion heuristi is esigne so s to gener te 
more uts th n ne ess ry in the hope th t the orre t” set of uts will e 
in lu e n e the uts h ve een gener te Item tive segment tions re est 

represente y gr ph lie the segmentation graph he segment tion gr ph 
is Directed Acyclic Graph ( ) with st rt no e n n en no e h 

intern 1 no e is sso i te with n i te ut pro u e y the segment tion 
Igorithm h r etween sour e no e n estin tion no e is sso i te 
with n im ge th t ont ins 11 the ink etween the ut sso i te with the 

sour e no e n the ut sso i te with the estin tion no e n r is re te 

etween two no es if the segmenter e i e th t the pie e(s) of ink etween the 
orrespon ing uts oul form n i te h r ter ypi lly e h in ivi u 1 

pie e of ink woul e sso i te with n r irs of su essive pie es of ink 

woul Iso e in lu e unless they re sep r te y wi e g p whi h is le r 
in i tion th t they elong to i erent h r ters h omplete p th through 
the gr ph ont ins e h pie e of ink on e n only on e h p th orrespon s 
to i erent w y of sso i ting pie es of ink together so s to form h r ters 

Recognition Transformer and Viterbi Transformer. simple N to 
re ognize h r ter strings is shown in igure nly the right r n h of the top 
h If is use for re ognition he left r n h is use for the tr ining pro e ure 
es ri e in the next su -se tion he N is ompose of two m in gr ph 
tr nsformers lie the recognition transformer Tree n the Viterbi transformer 
Tvit he go 1 of the re ognition tr nsformer is to gener te gr ph lie the 
interpretation graph or recognition graph Gmt th t ont ins 11 the possi le 
interpret tions for 11 the possi le segment tions of the input h p th in 
Gint represents one possi le interpret tion of one p rti ul r segment tion of the 
input he role of the iter i tr nsformer is to extr t the est interpret tion 
from the interpret tion gr ph 




338 



nn e un et 1 



Loss Function 

[0.1K+1) 



.3[0.1](+1), 




V- I ^ 



(5 

Gcvit 'T''- 4[0.6](+1) 

^vjte^^ansfomieij 




3 [3.41(01 — rn ^ 



• 3 [3.4)(0) 



4 [0.6K+1) 



Gvit 

j3 [0.1](-1)^4 [0.4](-1)^1 [0.1)(-1).^ 

f 



Interpretation 

Graph 




Neural Net 
Weights 







4 


4 


1 


NN 


NN 


NN 


NN 


NN 








/ 1 


y 




H 



Segmentation 

Graph 



'seg 




Fig. 8. N r hite ture for wor re ognition se on euristi ver- 

Segment tion uring re ognition only the right-h n p th of the top p rt is 
use or tr ining with iter i tr ining only the left-h n p th is use or is- 
rimin tive iter i tr ining oth p ths re use u ntities in s u re r kets 
re pen Ities ompute uring the forw r prop g tion u ntities in p renthe- 
ses re p rti 1 eriv tives ompute uring the kw r prop g tion 



je t e ognition with 



ient- 



e e nmg 



339 



he re ognition tr nsformer Tree t kes the segment tion gr ph Ggeg s in- 
put n pplies the re ognizer for single h r ters to the im ges sso i te 
with e h of the r s in the segment tion gr ph he interpret tion gr ph Gint 
h s Imost the s me stru ture s the segment tion gr ph ex ept th t e h r 
is repl e y set of r s from n to the s me no e n this set of r s there 
is one r for e h possi le 1 ss for the im ge sso i te with the orrespon - 
ing r in Ggeg o e h r is tt he 1 ss 1 el n the pen Ity th t the 
im ge elongs to this 1 ss s pro u e y the re ognizer f the segmenter h s 
ompute pen Ities for the n i te segments these pen Ities re om ine 
with the pen Ities ompute y the h r ter re ognizer to o t in the pen 1- 
ties on the r s of the interpret tion gr ph Ithough om ining pen Ities of 
i erent n ture seems highly heuristi the N tr ining pro e ure will tune 
the pen Ities n t ke v nt ge of this om in tion nyw y h p th in the 
interpret tion gr ph orrespon s to possi le interpret tion of the input wor 
he pen Ity of p rti ul r interpret tion for p rti ul r segment tion is given 
y the sum of the r pen Ities long the orrespon ing p th in the interpre- 
t tion gr ph omputing the pen Ity of n interpret tion in epen ently of the 
segment tion re uires to om ine the pen Ities of 11 the p ths with th t in- 
terpret tion his n e one using the forw r ” Igorithm wi ely use in 
i en rkov o els 

he iter i tr nsformer pro u es gr ph Gyit with single p th his p th 
is the p th of le st umul te pen Ity in the interpret tion gr ph he result of 
the re ognition n e pro u e y re ing o the 1 els of the r s long the 
gr ph Grit extr te y the iter i tr nsformer he iter i tr nsformer owes 
its n me to the f mous Viterbi algorithm to fin the shortest p th in gr ph 

4.2 Gradient-Based Training of a GTN 

he previous se tion es ri es the pro ess of re ognizing string using euristi 
ver-Segment tion ssuming th t the re ognizer is tr ine so s to ssign low 
pen Ities to the orre t 1 ss 1 el of orre tly segmente h r ters high pen 1- 
ties to erroneous tegories of orre tly segmente h r ters n high pen Ities 
to 11 tegories for ly forme h r ters his se tion expl ins how to tr in 
the system t the string level to o the ove without re uiring m nu 1 1 eling 
of h r ter segments 

n m ny ppli tions there is enough priori knowle ge out wh t is ex- 
pe te from e h of the mo ules in or er to tr in them sep r tely or ex mple 
with euristi ver-Segment tion one oul in ivi u lly 1 el single- h r ter 
im ges n tr in hr ter re ognizer on them ut it might e iffi ult to 
o t in n ppropri te set of non- h r ter im ges to tr in the mo el to re- 
je t wrongly segmente n i tes Ithough sep r te tr ining is simple it re- 
uires ition 1 supervision inform tion th t is often 1 king or in omplete (the 
orre t segment tion n the 1 els of in orre t n i te segments) he fol- 
lowing se tion es ri es two of the m ny gr ient- se metho s for tr ining 
N- se h n writing re ognizers t the string level iter i tr ining n is- 
rimin tive iter i tr ining nlike simil r ppro hes in the ontext of spec h 




340 



nn e un et 1 



re ognition we m ke no re ourse to pro ilisti interpret tion ut show 
th t within the r ient- se Le rning ppro h is rimin tive tr ining is 
simple inst n e of the perv sive prin iple of error orre ting le rning 



Viterbi Training, uring re ognition we sele t the p th in the nterpret tion 
r ph th t h s the lowest pen Ity with the iter i Igorithm e lly we woul 
like this p th of lowest pen Ity to e sso i te with the orre t 1 el se uen e 
s often s possi le no vious loss fun tion to minimize is therefore the ver ge 
over the tr ining set of the pen Ity of the p th associated with the correet label 
sequenee th t h s the lowest pen Ity he go 1 of tr ining will e to fin the 
set of re ognizer p r meters (the weights if the re ognizer is neur 1 network) 
th t minimize the ver ge pen Ity of this orre t” lowest pen Ity p th he 
gr ient of this loss fun tion n e ompute y k-prop g tion through the 
N r hite ture shown in figure using only the left-h n p th of the top 
p rt n ignoring the right h If his tr ining r hite ture ont ins gr ph 
tr nsformer lie path seleetor inserte etween the nterpret tion r ph 
n the iter i r nsformer his tr nsformer t kes the interpret tion gr ph 

n the esire 1 el se uen e s input t extr ts from the interpret tion gr ph 

those p ths th t ont in the orre t ( esire ) 1 el se uen e ts output gr ph 

Gc is lie the eonstrained interpretation graph n ont ins 11 the p ths th t 

orrespon to the orre t 1 el se uen e he onstr ine interpret tion gr ph 
is then sent to the iter i tr nsformer whi h pro u es gr ph Gcvit with 
single p th his p th is the orre t” p th with the lowest pen Ity in lly 
p th s orer tr nsformer t kes Gcvit n simply omputes its umul te pen Ity 
Gcvit y ing up the pen Ities long the p th he output of this N is the 
loss fun tion for the urrent p ttern 



Evit 



G 



cvit 



( 3 ) 



he only 1 el inform tion th t is re uire y the ove system is the se uen e 

of esire h r ter 1 els No knowle ge of the orre t segment tion is re uire 

on the p rt of the supervisor sin e the system hooses mong the segment tions 
in the interpret tion gr ph the one th t yiel s the lowest pen Ity 

he pro ess of k-prop g ting gr ients through the iter i tr ining N 

is now es ri e s expl ine in se tion 1 the gr ients must e prop g te 
kw r s through 11 mo ules of the N in or er to ompute gr ients in 
pre e ing mo ules n there fter tune their p r meters k-prop g ting gr - 
ients through the p th s orer is uite str ightforw r he p rti 1 eriv tives 
of the loss fun tion with respe t to the in ivi u 1 pen Ities on the onstr ine 
iter i p th Gcvit re e u 1 to 1 sin e the loss fun tion is simply the sum of those 
pen Ities k-prop g ting through the iter i r nsformer is e u lly simple 
he p rti 1 eriv tives of E'vit with respe t to the pen Ities on the r s of the 
onstr ine gr ph Gc re 1 for those r s th t ppe r in the onstr ine iter i 

p th Gcvit n 0 for those th t o not hy is it legitim te to k-prop g te 

through n essenti lly is rete fun tion su h s the iter i r nsformer? he 

nswer is th t the iter i r nsformer is nothing more th n olle tion of 




je t e ognition with 



ient- 



e e nmg 



341 



min fun tions n ers put together t n e shown e sily th t gr ients 
n e k-prop g te through min fun tions without verse e e ts k- 
prop g tion through the p th sele tor tr nsformer is simil r to k-prop g tion 
through the iter i tr nsformer r s in Gi„t th t ppe r in Gc h ve the s me 
gr ient s the orrespon ing r in Gc i e 1 or 0 epen ing on whether the 
r ppe r in Gcvit he other r s i e those th t o not h ve n alter ego in 

Gc e use they o not ont in the right 1 el h ve gr ient of 0 uring 

the forw r prop g tion through the re ognition tr nsformer one inst n e of 
the re ognizer for single h r ter w s re te for e h r in the segment tion 
gr ph he st te of re ognizer inst n es w s store Sin e e h r pen Ity in 
Gint is pro u e y n in ivi u 1 output of re ognizer inst n e we now h ve 
gr ient (1 or 0) for e h output of e h inst n e of the re ognizer e ognizer 
outputs th t h ve non zero gr ient re p rt of the orre t nswer n will 
therefore h ve their v lue pushe own he gr ients present on the re ognizer 
outputs n e k-prop g te through e h re ognizer inst n e or e h re - 

ognizer inst n e we o t in ve tor of p rti 1 eriv fives of the loss fun tion 

with respe t to the re ognizer inst n e p r meters 11 the re ognizer inst n es 
sh re the s me p r meter ve tor sin e they re merely lones of e h other 

therefore the full gr ient of the loss fun tion with respe t to the re ognizer’s 

p r meter ve tor is simply the sum of the gr ient ve tors pro u e y e h 
re ognizer inst n e iter i tr ining though formul te i erently is often use 
in - se spee h re ognition systems 26 

hile it seems simple n s tisfying this tr ining r hite ture h s 11 w 

th t n potent! lly e f t 1 f the re ognizer is simple neur 1 network with 

sigmoi output units the minimum of the loss fun tion is tt ine not when 
the re ognizer Iw ys gives the right nswer ut when it ignores the input n 
sets its output to oust nt ve tor with sm 11 v lues for 11 the omponents 
his is known s the eollapse problem he oil pse only o urs if the re ognizer 
outputs n simult neously t ke their minimum v lue f on the other h n the 
re ognizer’s output 1 yer ont ins units with fixe p r meters then there 
is no su h trivi 1 solution his is ue to the f t th t set of with fixe 

istin t p r meter ve tors nnot simult neously t ke their minimum v lue 
n this se the omplete oil pse es ri e ove oes not o ur owever 
this oes not tot lly prevent the o urren e of mil er oil pse e use the 
loss fun tion still h s fl t spot” for trivi 1 solution with onst nt re ognizer 
output his fl t spot is s le point ut it is ttr five in Imost 11 ire tions 
n is very ifh ult to get out of using gr ient- se minimiz tion pro e ures 
f the p r meters of the s re Howe to pt then the oil pse pro lems 
re ppe rs e use the enters n 11 onverge to single ve tor n the 

un erlying neur 1 network n le rn to pro u e th t ve tor n ignore the input 
i erent kin of oil pse o urs if the wi th of the s re Iso Howe to 
pt he oil pse only o urs if tr in le mo ule su h s neur 1 network 
fee s the s nother pro lem with iter i tr ining is th t the pen Ity of the 
nswer nnot e use reli ly s me sure of onfi en e e use it oes not 
t ke low-pen lty(orhigh-s oring) ompeting nswers into ount simple w y 




342 



nn e un et 1 



to ress this pro lem n to voi the oil pse is to tr in the whole system 

with is rimin tive loss fun tion s es ri e in the next se tion 



Discriminative Viterbi Training, he i e of is rimin tive iter i tr ining 
is to not only minimize the umul te pen Ity of the lowest pen Ity p th with 
the orre t interpret tion ut Iso to somehow in re se the pen Ity of ompeting 
n possi ly in orre t p ths th t h ve ngerously low pen Ity his type of 
riterion is lie discriminative e use it pi ys the goo nswers g inst the 
ones is rimin tive tr ining pro e ures n e seen s ttempting to nil 

ppropri te sep r ting surf es etween 1 sses r ther th n to mo el in ivi u 1 

1 sses in epen ently of e h other 

ne ex mple of is rimin tive riterion is the i eren e etween the pen Ity 

of the iter i p th in the onstr ine gr ph n the pen Ity of the iter i 

p th in the (un onstr ine ) interpret tion gr ph i e the i eren e etween the 
pen Ity of the est orre t p th n the pen Ity of the est p th ( orre t or 
in orre t) he orrespon ing N tr ining r hite ture is shown in figure 
he left si e of the i gr m is i enti 1 to the N use for non- is rimin tive 
iter i tr ining his loss fun tion re u es the risk of oil pse e use it for es 
the re ognizer to increases the pen Ity of wrongly re ognize o je ts is rimi- 
n tive tr ining n Iso e seen s nother ex mple of error correetion proeedure 
whi h ten s to minimize the i eren e etween the esire output ompute in 
the left h If of the N in figure n the tu 1 output ompute in the right 
h If of figure 

Let the is rimin tive iter i loss fun tion e enote ifdvit n let us 11 
Ccvit the pen Ity of the iter i p th in the onstr ine gr ph n Cvit the 
pen Ity of the iter ip th in the un onstr ine interpret tion gr ph 

L'dvit Ccvit Cvit (4) 

Edvit is Iw ys positive sin e the onstr ine gr ph is su set of the p ths in 
the interpret tion gr ph n the iter i Igorithm sele ts the p th with the 
lowest tot 1 pen Ity n the i e 1 se the two p ths Ccvit n Cvit oin i e n 
Cdvit is zero 

k-prop g ting gr ients through the is rimin tive iter i N s 
some neg tive” tr ining to the previously es ri e non- is rimin tivetr ining 
igure shows how the gr ients re k-prop g te he left h If is i enti 1 
to the non- is rimin tive iter i tr ining N therefore the k-prop g tion 
is i enti 1 he gr ients k-prop g te through the right h If of the N 
re multiplie y -1 sin e Cvit ontri utes to the loss with neg tive sign 
therwise the pro ess is simil r to the left h If he gr ients on r s of Gi„t 
get positive ontri utions from the left h If n neg tive ontri utions from the 
right h If he two ontri utions must e e sin e the pen Ities on Gi„t r s 
re sent to the two h Ives through ” onne tion in the forw r p ss r s in 
Gint th t ppe r neither in Gvit nor in Gcvit h ve gr ient of zero hey o not 
ontri ute to the ost r s th t ppe r in oth Gvit n Gcvit Iso h ve zero 
gr ient he -1 ontri ution from the right h If n els the the +1 ontri ution 




je t e ognition with 



ient- 



e e nmg 



343 



from the left h If n other wor s when n r is rightfully p rt of the nswer 
there is no gr ient f n r ppe rs in Gcvit ut not in Gvit the gr ient is 
+ 1 he r shoul h ve h lower pen Ity to m ke it to Gvit f n r is in 

Gvit ut not in Gcvit the gr ient is -1 he r h low pen Ity ut shoul 

h ve h higher pen Ity sin e it is not p rt of the esire nswer ri tions of 
this te hni ue h ve een use for the spee h re ognition ri n ourt n ottou 
5 use version of it where the loss fun tion is s tur te to fixe v lue 

n import nt v nt ge of glo In is rimin five tr ining is th t le ru- 

ing fo uses on the most import nt errors n the system le rns to integr te the 
m iguities from the segment tion Igorithm with the m iguities of the h r- 
ter re ognizer here re other tr ining pro e ures th n the ones es ri e 

here some of whi h re es ri e in 19 omplex r ph r nsformer mo ules 

th t om ine interpret tion gr phs with 1 ngu ge mo els n e use to t ke 
linguist! onstr ints into ount 19 

5 Conclusion 

he metho s es ri e in this p per onfirms wh t the history of ttern e og- 
nition h s Ire y shown repe te ly fin ing w ys to in re se the role of le rning 
n st tisti 1 estim tion Imost inv ri ly improves the perform n e of re og- 
nition systems or 2 sh pe re ognition onvolution 1 Neur 1 Networks h ve 
een shown to elimin te the nee for h n - r fte fe ture extr tors epli te 
onvolution 1 Networks h ve een shown to h n le f irly omplex inst n es of 
the fe ture in ing pro lem with ompletely fee -forw r tr ine r hite - 
ture inste of the more tr ition 1 om in tori 1 hypothesis testing metho s 

n situ tion where multiple hypothesis testing is un voi le tr in le r ph 

r nsformer Networks h ve een shown to re u e the nee for h n - r fte 
heuristi s m nu 1 1 eling n m nu 1 p r meter tuning in o ument re ogni- 
tion systems 

lo lly-tr ine r ph r nsformer Networks h ve een pplie su essfully 
to on-line h n writing re ognition n he k re ognition 19 he he k re og- 
nition system se on this on ept is use ommer i lly in sever 1 nks ross 

the S n re s millions of he ks per y he on epts n results in this 

p per help est lish the usefulness n relev n e of gr ient- se minimiz tion 
metho s s gener 1 org nizing prin iple for le rning in 1 rge systems t is le r 
th t r ph r nsformer Networks n e pplie to m ny situ tions where the 
om in knowle ge or the st te inform tion n e represente y gr phs his 
is the se in m ny visu 1 t sks where gr phs n represent Item tive inter- 
pret tions of s ene multiple inst n es of n o je t or rel tionship etween 

0 je ts 

References 

1 engio e un ohl n u ge (199 e e / 

y i fo n- ine n w iting e ognition Neural Computation 7( 




344 



nn e un et 1 



2 ottou n llin i (1991 mewo k fo the oope tion of e n- 

ing Igo ithm n ou etzky n ippm nn e ito Advances in Neural 
Information Proeessing Systems volume 3 enve o g n K ufm nn 

3 ottou e un n engio (1997 lo 1 ining of o ument o- 
e ing Sy tern u ing ph n fo me etwo k n Proc. of Computer Vision 

and Pattern Recognition ue to- i o 

4 u ge n S hoelkopf (1997 mp oving the u y n pee of 

uppo t ve to m hine n oze n et he e ito Advances in 

Neural Information Processing Systems 9 he e m i ge 

i n ou t n ottou (1991 n omp i on oop- 

e tion n Proeeedings of the International Joint Conferenee on Neural Networks 
Se ttle 

6 u ke S h pi e n Sim (1993 mp oving pe fo m n e in neu 1 

netwo k u ing oo ting Igo ithm n n on S ow n n ile 

e ito Advances in Neural Information Proeessing Systems 5 p ge 42-49 
S n teo o g n K ufm nn 

7 uku him K (197 ognit on Self- g nizing ultil ye e eu 1 etwo k 

Biological Cybernetics 20 121-136 

8 uku him K n iy ke S (1982 eo ognit on new Igo ithm fo p tte n 
e ognition tole nt of efo m tion n hift in po ition Pattern Reeognition 

1 4 -469 

9 u el n ie el (1962 e eptive iel ino ul nte tion n 

un tion 1 hite tu e in the t i u 1 o tex Journal of Physiology (London) 

160 106-1 4 

10 Keele umelh t n eow K (1991 nteg te egment tion 

n e ognition of h n -p inte nume 1 n ippm nn oo y n 

ou etzky S e ito Neural Information Proeessing Systems volume 3 p ge 

7-63 o g n K ufm nn u li he S n teo 

11 e o iiggen uhm nn n von e lug (1993 

i to tion nv i nt je t e ognition in the yn mi ink hite tu e IEEE 

Trans. Comp. 42(3 300-311 

12 w en e S ile oi n k (1997 e e ognition 

onvolution 1 eu 1 etwo k pp o h IEEE Transaetions on Neural Networks 

8(1 98-113 

13 e un (1986 e ning o e e in n ymmet i h e hoi etwo k n 

ienen to k ogelm n-Soulie n ei u h e ito Disordered systems 

and biological organization p ge 233-240 e ou he n e Sp inge - e 1 g 

14 e un (1987 Modeles connexionnistes de Tapprentissage (connectionist 
learning models) h the i nive ite et u ie ( i 6 

1 e un (1988 theo eti 1 f mewo k fo k- op g tion n ou etzky 
inton n Sejnow ki e ito Proeeedings of the 1988 Connectionist 

Models Summer Sehool p ge 21-28 itt u gh o g n K ufm nn 

16 e un (1989 ene liz tion n etwo k e ign St tegie n feife 

S h ete ogelm n n Steel e ito Conneetionism in Perspective 

u i h Switze 1 n 1 evie 

17 e un o e enke S en e on ow u 

n kel (1989 kp op g tion pplie to n w itten ip o e 

e ognition Neural Computation 1(4 41- 1 

18 e un o e enke S en e on ow u 

n kel (1990 n w itten igit e ognition with k-p op g tion net- 




je t e ognition with 



ient- 



e e nmg 



34 



wo k n ou etzky e ito Advances in Neural Information Processing Systems 
2 (NIPS*89) enve o g n K ufm n 

19 e un ottou engio n ne (1998 lent- e e n- 

ing pplie to o ument e ognition Proceedings of the IEEE (86 11 2278-2324 

20 e un K nte n Soil S (1991 igenv lue of ov i n e m t i e 
ppli tion to neu 1-netwo k le ning Physical Review Letters 66(18 2396-2399 

21 tin (1993 ente e -o je t integ te egment tion n e ognition of 

ove 1 pping h n -p inte h te Neural Computation 419-429 

22 t n u ge e un n enke S (1992 ulti- igit 

e ognition ing Sp e i pi ement eu 1 etwo k n oo y n on 

S n ippm n e ito Neural Information Processing Systems volume 4 

0 g n K ufm nn u li he S n teo 

23 oze (1991 The perception of multiple objects: A connectionist approach 

e - fo ook m i ge 

24 owl n S n 1 tt (199 onvolution 1 eu 1 etwo k n ke 

n e u o ou etzky n een e ito Advances in Neural Information 

Processing Systems 7 p ge 901-908 S n teo o g n K ufm nn 

2 un eun n i o i (1997 ining Suppo t e to hine 

n ppli tion to e ete tion n Proceedings of CVPR’96 p ge 130-136 

ompute So iety e 

26 ine (1989 uto i 1 n i en kov o el n Sele te p- 

pli tion in Spee h e ognition Proceedings of the IEEE 77(2 2 7-286 

27 owley luj S n K n e (1996 eu 1 etwo k- e e 

ete tion n Proceedings of CVPR’96 p ge 203-208 ompute So iety 

e 

28 umelh t inton n illi m (1986 e ning inte n 1 

ep e ent tion ye o p op g tion n Parallel distributed processing: Explorations 

in the microstructure of cognition volume p ge 318-362 fo ook m- 

1 ge 

29 S h pi e (1990 he t ength of we k le n ility Machine Learning 

(2 197-227 

30 ill nt on o n e un (1994 igin 1 pp o h fo the lo- 

ll tion of o je t in im ge lEE Proc on Vision, Image, and Signal Processing 

141(4 24 -2 0 

31 pnik (199 The Nature of Statistical Learning Theory Sp inge ew- 

o k 

32 ng n e n (1993 ulti- e olution neu 1 netwo k fo omnifont h - 
te e ognition n Proceedings of International Conference on Neural Networks 

volume p ge 1 88-1 93 

33 olf n 1 tt (1994 o t 1 e lo k lo tion u ing onvolution 1 

lo to netwo k n ow n euo nlpeto e ito Advances 

in Neural Information Processing Systems p ge 74 -7 2 




Author Index 



Belhumeur, P.N., 95, 132 
Belongie, S., 155 
Bengio, Y., 319 
Bottou, L., 319 

Carlsson, S., 58 
Cepeda, M., 31 
Chella, A., 264 
Cipolla, R., 72 
Curwen, R.W., 182 

Di Gesu, V., 264 

Forsyth, D., 3, 9, 302 

Georghiades, A.S., 95 

Haddon, J., 302 
Ilaffner, P., 319 
Hartley, R., 246 

Infantino, I., 264 
Intravaia, D., 264 
Ioffe, S., 302 

Kriegman, D.J., 95, 132 



LeGun, Y., 319 
Leung, T., 155 

Malik, J., 155 
Mohr, R., 217 

Mundy, J.L., 3, 22, 182, 234 

Pae, S.-i., 31 
Ponce, J., 31 
Proesmans, M., 196 

Sato, J., 72 
Saxena, T., 234, 246 
Schaffalitzky, F., 165 
Schmid, C., 217 
Shi, J., 155 
Sullivan, S., 31 

Torr, P.H.S., 277 
Tu, R, 246 

Valenti, G., 264 
Van Gool, L., 196 

Yuille, A.L., 132 

Zisserman, A., 165, 217 




